Background

Traumatic brain injury (TBI) causes an enormous health and economic burden around the world [1]. Patients with moderate and severe TBI are at high risk for poor outcomes and often require intensive care unit (ICU) admission. In these patients, evidence-based treatment options are scarce and large differences in outcome and daily ICU practice exist [2,3,4,5].

Research to establish more evidence-based and thereby uniform treatment policies for patients with TBI has high priority. Still, breakthrough intervention strategies are scarce [6] and guideline recommendations remain limited. Therefore, new strategies, such as precision medicine and routine quality measurement, are being explored to drive research and clinical practice forward [1]. Routine quality measurement using appropriate indicators can guide quality improvement, for example, through identifying best practices and internal quality improvement initiatives. The potential of quality indicators to improve care has already been demonstrated in other clinical areas [7], in other ICU populations like sepsis [8] or stroke patients [9], and in children with TBI [10, 11].

However, there are also examples of quality indicators that do not positively affect the quality of care. This may be for various reasons, such as lack of validity and reliability, poor data quality, or lack of support by clinicians [12,13,14]. Deploying poor indicators has opportunity costs due to administrative burden while distorting healthcare priorities. An evaluation of a putative quality indicator is inherently multidimensional, and when used to identify best practice or benchmark hospitals, validity and reliability and uniform definitions are all equally important [15, 16].

Although some quality indicator sets for the general ICU exist [17, 18], there are no consensus-based quality indicators specific for the treatment of adult patients with TBI. Delphi studies have been proposed as a first step in the development of quality indicators [19]. The systematic Delphi approach gathers information from experts in different locations and fields of expertise to reach group consensus without groupthink [19], an approach which aims to ensure a breadth of unbiased participation.

The aim of this study was to develop a consensus-based European quality indicator set for patients with TBI at the ICU and to explore barriers and facilitators for implementation of these quality indicators.

Methods

This study was part of the Collaborative European NeuroTrauma Effectiveness Research in Traumatic Brain Injury (CENTER-TBI) project [20].

An Advisory Committee (AC) was convened, consisting of 1 neurosurgeon (AM), 3 intensivists (MJ, DM, GC), 1 emergency department physician (FL), and 3 TBI researchers (HL, ES, LW) from 5 European countries. The AC’s primary goals were to provide advice on the recruitment of the Delphi panel, to monitor the Delphi process, and to interpret the final Delphi results. During a face-to-face meeting (September 2017), the AC agreed that the Delphi study would initially be restricted to Europe, recruit senior professionals as members of the Delphi panel, and focus on the ICU. The restriction to a European rather than a global set was motivated by substantial continental differences in health funding systems, health care costs, and health care facilities. The set was targeted to be generalizable for the whole of Europe and therefore included European Delphi panelists. The AC agreed to target senior professionals as Delphi panelists as they were expected to have more specialized and extensive clinical experience with TBI patients at the ICU. The AC decided to focus the indicator set on ICU practice, since ICU mortality rates are high (around 40% in patients with severe TBI [21]), large variation in daily practice exists [2,3,4,5, 22], and detailed data collection is generally more feasible in the ICU setting due to available patient data management systems or electronic health records (EHRs). We focused on adult patients with TBI.

Delphi panel

The AC identified 3 stakeholder groups involved in ICU quality improvement: (1) clinicians (physicians and nurses) primary responsible for ICU care, (2) physicians from other specialties than intensive care medicine who are regularly involved in the care of patients with TBI at the ICU, and (3) researchers/methodologists in TBI research. It was decided to exclude managers, auditors, and patients as stakeholders, since the completion of the questionnaires required specific clinical knowledge. Prerequisites to participate were a minimum professional experience of 3 years at the ICU or in TBI research. Stakeholders were recruited from the personal network of the AC (also through social media), among the principal investigators of the CENTER-TBI study (contacts from more than 60 NeuroTrauma centers across 22 countries in Europe) [20], and from a European publication on quality indicators at the ICU [18]. These experts were asked to provide additional contacts with sufficient professional experience.

Preliminary indicator set

Before the start of the Delphi process, a preliminary set of quality indicators was developed by the authors and the members of the AC, based on international guidelines (Brain Trauma Foundation [23] and Trauma Quality Improvement Program guidelines [24]), ICU practice variation [3,4,5], and clinical expertise (Additional file 1: Questionnaire round 1). Quality indicators were categorized into structure, process, and outcome indicators [25]. Overall, due to the absence of high-quality evidence on which thresholds to use in TBI management, we refrained from formulating quality indicators in terms of thresholds. For example, we did not use specific carbon dioxide (CO2) or intracranial pressure (ICP) thresholds to define quality indicators for ICP-lowering treatments.

Indicator selection

The Delphi was conducted using online questionnaires (Additional files 12, and 3). In the first round, the AC rated the preliminary quality indicators on four criteria: validity, discriminability (to distinguish differences in center performance), feasibility (regarding data collection required), and actionability (to provide clear directions on how to change TBI care or otherwise improve scores on the indicator) [26,27,28,29,30] (Table 1). We used a 5-point Likert scale varying from strongly disagree (1) to strongly agree (5). Additionally, an “I don’t know” option was provided to capture uncertainty. Agreement was defined as a median score of 4 (agreement) or 5 (strong agreement) on all criteria. Disagreement was defined as a median score below 4 on at least one of the four criteria [31, 32]. Consensus was defined as an interquartile range (IQR) ≤ 1 (strong consensus) on validity—since validity is considered the key characteristic for a useful indicator [19]—and IQR ≤ 2 (consensus) on the other criteria [31, 32]. Criteria for rating the indicators and definitions of consensus remained the same during all rounds. The AC was able to give recommendations for indicator definitions at the end of the questionnaire. Indicators were excluded for the second Delphi round when there was consensus on disagreement on at least one criterion, unless important comments for improvement of the indicator definition were made. Such indicators with improved definitions were rerated in the next Delphi round.

Table 1 Selection criteria used to rate the quality indicators

In the second round, the remaining indicators were sent to a larger group of experts. The questionnaire started with a description of the goals of the study, and some characteristics of experts were asked. Experts had the possibility to adapt definitions of indicators at the end of a group of indicators on a certain topic (domain). Indicators were included in the final set when there was agreement and consensus, excluded when there was disagreement and consensus, and included the next round when no consensus was reached or important comments to improve the indicator definitions were given. As many outcome scales exist for TBI, like the Glasgow Outcome Scale - Extended (GOSE), Coma Recovery Scale - Revised (CSR-R), and Rivermead Post-Concussion Symptoms Questionnaire (RPQ), a separate ranking question was used to determine which outcome scales were preferred (or most important) to use as outcome indicators—to avoid an extensive outcome indicator set (Additional file 2, question outcome scales). Outcome scales that received the highest ratings (top 3) were selected for round 3 and rated as described above. Finally, exploratory questions were asked for which goals or reasons experts would implement the quality indicators. We only selected experts for the final round that completed the full questionnaire.

In the last round, the expert panel was permitted only to rate the indicators, but could not add new indicators or suggest further changes to definitions. Experts received both qualitative and quantitative information on the rating of indicators (medians and IQRs) from round 2 for each individual indicator. Indicators were included in the final set if there was both agreement and consensus. Final exploratory questions were asked regarding the barriers and facilitators for implementation of the indicator set. For each Delphi round, three automated reminder emails and two personal reminders were sent to the Delphi participant to ensure a high response rate.

Statistical analysis

Descriptive statistics (median and interquartile range) were calculated to determine which indicators were selected for the next round and to present quantitative feedback (median and min-max rates) in the third Delphi round. “I don’t know” was coded as missing. A sensitivity analysis after round 3 was performed to determine the influence of experts from Western Europe compared with other European regions on indicator selection (in- or exclusion in the final set). Statistical analyses were performed using the R statistical language [33]. Questionnaires were developed using open-source LimeSurvey software [34]. In LimeSurvey, multiple online questionnaires can be developed (and send by email), the response rates can be tracked, and questionnaire scores or responses can easily be exported to a statistical program.

Results

Delphi panel

The Delphi rounds were conducted between March 2018 and August 2018 (Fig. 1). Approximately 150 experts were invited for round 2, and 50 experts from 18 countries across Europe responded (≈33%). Most were intensivists (N = 24, 48%), followed by neurosurgeons (N = 7, 14%), neurologists (N = 5, 10%), and anesthesiologist (N = 5, 10%) (Table 2). Most of the experts indicated to have 15 years or more experience with patients with TBI at the ICU or another department (N = 25, 57%). Around half of the experts indicated that they had primary responsibility for the daily practical care of patients with TBI at the ICU (N = 21, 47%). Experts were employed in 37 centers across 18 European countries: mostly in Western Europe (N = 26, 55%). Most experts were from academic (N = 37, 84%) trauma centers in an urban location (N = 44, 98%). Almost all experts indicated the availability of EHRs in their ICU (N = 43, 96%). Thirty-one experts (63%) participated in the CENTER-TBI study. The response rate in round 3 was 98% (N = 49).

Fig. 1
figure 1

Overview of the Delphi process. Overview of the Delphi process: time frame, experts’ involvement, and indicator selection; *8 indicators were removed based on the sensitivity analyses. The left site of the figure shows the number of indicators that were removed after disagreement and consensus with no comments to improve definitions. In addition, the number of changed indicator definitions is shown. The right site of the figure shows the number of newly proposed indicators (that were rerated in the next Delphi round) and the number of indicators that were included in the final indicator set. After round 2, 17 indicators were included in the final set (and removed from the Delphi process), and after round 3, 25 indicators were included in the final set—a total of 42 indicators. The agreement was defined as a median score of 4 (agreement) or 5 (strong agreement) on all four criteria (validity, feasibility, discriminability, and actionability) to select indicators. The disagreement was defined as a median score below 4 on at least one of the four criteria. The consensus was defined as an interquartile range (IQR) ≤ 1 (strong consensus) on validity—since validity is considered the key characteristic for a useful indicator [19]—and IQR ≤ 2 (consensus) on the other criteria

Table 2 Baseline characteristics Delphi panel

Indicator selection

The first Delphi round started with 66 indicators (Fig. 1). In round 1, 22 indicators were excluded. The main reason for exclusion was poor agreement (median < 4) on all criteria except discriminability (Additional file 4). Round 2 started with 46 indicators; 17 were directly included in the final set and 7 were excluded, mainly due to a poor agreement (median < 4) on actionability and poor consensus (IQR > 1) on validity. Round 3 started with 40 indicators; 25 indicators were included in the final set. Exclusion of 8 indicators was based on the sensitivity analysis (no consensus in Western Europe versus other European regions) and 7 indicators had low agreement on actionability or no consensus on validity or actionability. During the full Delphi process, 20 new indicators were proposed, and 30 definitions were discussed and/or modified.

The final quality indicator set consisted of 42 indicators on 13 clinical domains (Table 3), including 17 structure indicators, 16 process indicators, and 9 outcome indicators. For the domains “precautions ICP monitoring,” “sedatives,” “osmotic therapies,” “seizures,” “fever,” “coagulopathy,” “respiration and ventilation,” and “red blood cell policy,” no indicators were included in the final set.

Table 3 Finally proposed set of clinical quality indicators in traumatic brain injury at the ICU

Experts proposed changing the names of the “short-term outcomes” and “long-term outcomes” domains to “in-hospital outcomes” and “after discharge or follow-up outcomes.” In round 2, the Glasgow Outcome Coma Scale Extended (GOSE), quality of life after brain injury (Qolibri), and short form health survey (SF-36) were rated the best outcome scales. However, the Qolibri was excluded in round 3 as an outcome indicator, since there was no consensus in the panel on its validity to reflect the quality of ICU care. The majority of experts (N = 14, 28%) indicated that the outcome scales should be measured at 6 months, but this was closely followed by experts that indicated both at 6 and 12 months (N = 13, 26%).

Barriers and facilitators for implementation

Almost all experts indicated that the indicator set should be used in the future (N = 49, 98%). One expert did not believe an indicator set should be used at all, because it would poorly reflect the quality of care (N = 1, 2%).

The majority of experts indicated that the set could be used for registry purposes (N = 41, 82%), assessment of adherence to guidelines (N = 35, 70%), and quality improvement programs (N = 41, 82%). Likewise, the majority of experts indicated that the indicator set could be used for benchmarking purposes (N = 42, 84%); both within and between centers. Pay for performance was rarely chosen as a future goal (N = 3, 6%). Almost all experts indicated administrative burden as a barrier (N = 48, 98%). Overall, experts endorsed facilitators more than the barriers for implementation (Fig. 2).

Fig. 2
figure 2

Facilitators or barriers for implementation of the quality indicator set. Percentage of experts that indicated a certain facilitator or barrier for implementation of the quality indicator set. Other indicated facilitator was “create meaningful uniform indicators.” Other indicated barriers were “gaming” (N = 1, 2%) and “processes outside of ICU (e.g., rehabilitation) are hard to query.” *Participation in trauma quality improvement program

Discussion

Main findings

This three-round European Delphi study including 50 experts, resulted in a quality indicator set with 42 indicators with high-level of consensus on validity, feasibility, discriminability, and actionability, representing 13 clinical domains for patients with TBI at the ICU. Experts indicated multiple facilitators for implementation of the total set, while the main barrier was the anticipated administrative burden. The selection of indicators during the Delphi process gave insight in which quality indicators were perceived as important to improve the quality of TBI care. In addition, the indicator definitions evolved during the Delphi process, leading to a final set of understandable and easy to interpret indicators by (clinical) experts. This set serves as a starting point to gain insight into current ICU care for TBI patients, and after empirical validation, it may be used for quality measurement and improvement.

Our Delphi resulted in 17 structure indicators, 16 process indicators, and 9 outcome indicators. A large number of structure indicators already reached consensus after round 2; this might reflect that these were more concise indicators. However, during the rounds, definitions for process indicators became more precise and specific. Process indicators must be evidence-based before best practices can be determined: this might also explain that important domains with indicators on daily care in TBI (such as decompressive craniectomy, osmotic therapies, respiration, and ventilation management) did not reach consensus in our Delphi study. Structure, process, and outcome indicators have their own advantages and disadvantages. For example, process indicators tend to be inherently actionable as compared to structure and outcome indicators, yet outcome indicators are more relevant to patients [35]. Most indicators were excluded from the set due to low agreement and lack of consensus on actionability and validity: this indicates that experts highly valued the practicality and usability of the set and were strict on selecting only those indicators that might improve patient outcome and processes of care. Overall, the complete set comprises all different types of indicators.

Existing indicators

Some national ICU registries already exist [17], and in 2012, a European ICU quality indicator set for general ICU quality has been developed [18]. In addition, several trauma databanks already exist [36, 37]. The motivations for selection (or rejection) of indicators in our study can contribute to the ongoing debate on which indicators to collect in these registries. For example, length of stay is often used as an outcome measure in current registries, but the Delphi panel commented that determination of the length of stay is debatable as an indicator, since hospital structures differ (e.g., step-down units are not standard), and admission length can be confounded by (ICU) bed availability. Although general ICU care is essential for TBI, not all general ICU or trauma indicators are applicable in exactly the same way for TBI. For example, individualized deep venous thrombosis prophylaxis management in TBI is a priority in view of the risk of progressive brain hemorrhage in contrast to other ICU conditions (e.g., sepsis). Therefore, our TBI-specific indicator set might form a useful addition to current registries.

Strength and limitations

This study has several strengths and limitations. No firm rules exist on how to perform a Delphi study in order to develop quality indicators [19]. Therefore, we extensively discussed the methodology and determined strategies with the Advisory Committee. Although the RAND/UCLA Appropriateness Method recommends a panel meeting [38], no group discussion took place in our study to avoid overrepresentation of strong voices and for reasons of feasibility. However, experts received both qualitative and quantitative information on the rating of indicators to gain insight into the thinking process of the other panel members. Considering the preliminary indictor set, we used the guidelines [23, 24] as a guide to which topics should be included and not as an evidence base. Considering the Delphi panel, the success of indicator selection depends on the expertise of invited members: we assembled a large network of 50 experts from 18 countries across Europe with various professional backgrounds. All participants can be considered as established experts in the field of TBI-research and/or daily clinical practice (around 70% of experts had more than 10 years of ICU experience). However, more input from some key stakeholders in the quality of ICU care, such as rehabilitation physicians, nurses and allied health practitioners, health care auditors, and TBI patients, would have been preferable. We had only three rehabilitation experts on our panel, but increased input from this group of professionals would have been valuable, since they are increasingly involved in the care of patients even at the ICU stage. A number of nurses were invited, but none responded, possibly due to a low invitation rate. This is a severe limitation since nurses play a key role in ICU quality improvement and quality indicator implementation [39, 40]. Therefore, future studies should put even more efforts in involving nurses in quality indicator development. Experts were predominantly from Western Europe. Therefore, we performed sensitivity analyses for Western Europe and removed indicators with significant differences compared with other regions to obtain a set generalizable for Europe. The restriction to a European rather than a global set was motivated by substantial continental differences in health funding systems, health care costs, and health care facilities. Finally, some of the responses may have been strongly influenced by familiarity with measures (e.g., SF-36 was selected instead of Qolibri) rather that solely reflecting the value of the measure per se.

Use and implementation

Quality indicators may be used for the improvement of care in several ways. First, registration of indicator data itself will make clinicians and other stakeholders aware of their center or ICU performance, as indicators will provide objective data on care instead of perceived care. Second, as the evidence base for guidelines is often limited, this indicator set could support refinement of guideline recommendations. This was shown in a study by Vavilala et al., where guideline-derived indicators for the acute care of children with TBI were collected from medical records and were associated with improved outcome [10]. Third, quality indicators can be used to guide and to inform quality improvement programs. One study showed that a TBI-specific quality improvement program was effective, demonstrating lower mortality rates after implementation [41]. Fourth, (international) benchmarking of quality indicators will facilitate discussion between (health care) professionals and direct attention towards suboptimal care processes [17]. Future benchmarking across different hospitals or countries requires advanced statistical analyses such as random effect regression models to correct for random variation and case-mix correction. To perform such benchmarking, case-mix variables must be collected, like in general ICU prognostic models or the TBI-specific prognostic models, such as IMPACT and CRASH [42, 43].

A quality indicator set is expected to be dynamic: ongoing large international studies will further shape the quality indicator set. This is also reflected in the “retirement” of indicators over time (when 90–100% adherence is reached). Registration and use of the quality indicators will provide increasing insight into their feasibility and discriminability and provides the opportunity to study their validity and actionability. Such empirical testing of the set will probably reveal that not all indicators meet the required criteria and thus will reduce the number of indicators in the set, which is desirable, as the set is still quite extensive. For now, based on the dynamic nature of the set and ongoing TBI studies, we recommend to use this consensus-based quality indicator for registry purposes—to gain insight (over time) in current care and not for changing treatment policies. Therefore, we recommend to regard this consensus-based quality indicator set as a starting point in need of further validation, before broad implementation can be recommended. Such validation should seek to establish whether adherence to the quality indicators is associated with better patient outcomes.

To provide feedback on clinical performance, new interventions are being explored to further increase the effectiveness of indicator-based performance feedback, e.g., direct electronic audit and feedback with suggested action plans [44]. A single (external) organization for data collection could enhance participation of multiple centers. International collaborations must be encouraged and further endorsement by scientific societies seems necessary before large-scale implementation is feasible. When large-scale implementation becomes global, there is an urgent need to develop quality indicators for low-income countries [36, 45]. An external organization for data collection could also reduce the administrative burden for clinicians. This is a critical issue, since administrative burden was indicated as the main barrier for implementation of the whole indicator set, although experts agreed on the feasibility of individual indicators. In the future, automatic data extraction might be the solution to overcome the administrative burden.

Conclusion

This Delphi consensus study gives insight in which quality indicators have the potential to improve quality of TBI care at European ICUs. The proposed quality indicator set is recommended to be used across Europe for registry purposes to gain insight in current ICU practices and outcomes of patients with TBI. This indicator set may become an important tool to support benchmarking and quality improvement programs for patients with TBI in the future.