Introduction

Cost-effectiveness analysis (CEA) is increasingly used to inform health decision-making with a consequence that the methods and assumptions are exposed to critical review [1]. Criticism may be hard to avoid because cost-effectiveness models generally represent a simplification of the real world in which health interventions are delivered and patients are treated. George Box’s aphorism that ‘all models are wrong but some are useful’ is highly relevant in this field [2]; some simplification is essential because the real-world is too complex to be completely represented in a tractable model. Researchers generally adopt a rigorous approach to identifying estimates of intervention effectiveness, which are usually drawn from well-designed and well-conducted randomized controlled trials, or from systematic reviews and meta-analyses. There is less consistency of approach to the collection of epidemiological data to populate a CEA model, including the costs associated with model health states and the probabilities of transitioning between states. Modelling studies often rely on data inputs assembled from multiple secondary sources, often identified through literature or ‘scoping’ reviews and incorporating additional assumptions. In studies of population sub-groups, estimates may be derived from small non-probability samples. These processes could compromise both the precision and external validity of model-based estimates of cost-effectiveness.

Researchers can increasingly gain access to large population-based datasets compiled from electronic health records (EHRs), as well as administrative data, often referred to generically as routinely-collected or ‘real-world’ data. Empirical analysis of large datasets offers a coherent and transparent approach to populating an economic model with data that provides improved representativeness and precision, but this approach is only beginning to be utilized. This commentary discusses ways to capitalize on the strengths of EHRs in economic evaluations drawing on case studies in obesity and ageing [3, 4]. We also highlight methodological questions that remain to be addressed.

Primary care and linked electronic health records data

Primary care electronic records are important because of their population coverage and longitudinal continuity. In the United Kingdom (UK), more than 98% of individuals are registered with a National Health Service family physician; UK primary care EHRs are considered population-based. The Clinical Practice Research Datalink (CPRD) is one example of a primary care EHR database. Established more than 30 years ago, CPRD now includes some 11.3 million patients from almost 680 general practices; equivalent to almost 7% of the UK population, though the number of participating practices has varied over time. The CPRD includes general practices from England, Scotland, Wales and Northern Ireland, and the registered population is considered to be broadly geographically and socio-demographically representative of the UK population [5, 6]. Individual patients in the UK commonly maintain registration with one general practice over time, and longitudinal records in CPRD may span many years in length from 1990 to the present. For entry into CPRD, general practices must meet specific data quality criteria to ensure data are ‘up to standard’ for research. Access to CPRD data requires an annual institutional license for online access, or a fee per data-set, as well as scientific and ethical approval from regulatory overview bodies.

Data in primary care databases are assembled in relational file structure with separate files containing records for clinical consultations including medical diagnoses, referrals to hospital, therapy records including all prescriptions issued by the practice, test records and additional records of smoking, weight and height (Fig. 1). Three quarters of CPRD general practices in England participate in linkage to data collected in secondary care (Hospital Episode Statistics), mortality records (Office for National Statistics), deprivation category (Index of Multiple Deprivation) and cancer registrations [7, 8].

Fig. 1
figure 1

Illustration of primary care electronic health records and linked data with examples of estimates for health economic evaluation

Advantages of electronic health records for economic evaluation

Statistical analysis of large datasets from primary care EHRs, and linked data, provide precise and generalizable estimates of disease incidence and mortality that can be utilized for health economic models. These data sources facilitate estimates for population sub-groups defined by age, gender or deprivation category which are often not available in existing literature (Fig. 1) [9]. EHR data also enables analysis of health care utilization from large samples by different groups of patients outside of trial settings. Types of consultations (i.e. general practice, emergency, out-of-hours, inpatient, outpatient) can be combined with unit costs from reference sources to estimate costs of resource use [4, 10]. Prescriptions can be enumerated over time using dose and frequency information, to which a unit cost can be applied from a drug dictionary. Recent studies of the cost-effectiveness of lifestyle advice in primary care, the costs and outcomes of increasing access to bariatric surgery and the determinants of health care costs in the senior elderly exemplify these approaches [3, 4, 11].

When a population health policy has yet to be put into place, evidence for the health economic value of undertaking the new policy must be developed. This is the case with bariatric surgery for populations beyond the current eligibility criteria. The National Institute for Health and Care Excellence (NICE) recommends coverage of bariatric surgery for patients with morbid obesity (BMI ≥ 40) and those with severe obesity (BMI 35–39.9 kg/m2) who have at least one comorbidity. To examine whether it would be cost-effective to extend coverage to patients with severe obesity and no comorbidity, EHR data were used to test the cost-effectiveness of bariatric surgery for this group using literature-based and EHR-derived intervention effects [3, 12]. EHR data allows for modeling of costs and outcomes for a population that would otherwise not have had access to the intervention by using the incidence, prevalence, mortality and health care utilization found in the longitudinal patient records [3].

Using EHR data can also allow for the examination of additional outcomes in a modeling context that might not be collected in a trial and enable their follow-up beyond trial length. EHRs make it possible for researchers to evaluate outcomes and resource use across patient’s long-term interactions with the health care system. For example, EHR data allows for analysis of the impacts bariatric surgery has on depression as patients are tracked beyond the length of a trial collecting such data [13]. If depression is not a specific trial outcome, a population-based measure of an increase or decrease in depression post-operatively, generated from EHR data, can be laid on top of literature-based estimates of surgery effects. While a trial might end, the outcome of interest can still be estimated from EHRs.

Trial participants are carefully selected and often exclude those with advanced age or comorbidity [14]. There may be considerable uncertainty concerning potential outcomes for vulnerable populations such as frail elderly individuals who might not be able to safely participate in a trial, but could represent a large proportion of the population relevant to the research question. Particularly in the elderly, but also in other population groups (e.g. children and those for whom sample size in a local study site is small), resource use estimates from EHRs will be more reflective of real-world clinical practice than resource use data collected in a controlled trial setting. With a rapidly ageing population, this can offer a more pragmatic approach for modelling cost-effectiveness of treatments in older populations [15].

Elderly individuals with multiple morbidity comprise a crucial group when assessing the value of treatments for older people because the majority aged over 65 years will have multiple morbidity, defined as 4 or more chronic conditions [16]. These individuals may experience adverse events, using additional health care resources compared to people with little or no morbidity who are often more eligible to participate in trials. It may be problematic in this case to use clinical trials as a primary source of resource use data when this can be more accurately captured using EHRs. In using EHRs to evaluate health care utilization and cost patterns among individuals aged 80 up to 100 years and older, these data can now be used as population-based estimates in economic models assessing the cost-effectiveness of various interventions in the very elderly, including representation of those with multiple morbidity [4]. Estimates of costs and outcomes from longitudinal EHRs can act either as a supplement for trial data or as the primary source of information for a model-based evaluation. These data may be combined with trial-based estimates of intervention effects to model potential outcomes, because relative risk estimates of intervention effect from clinical trials are expected to be more transportable between populations than absolute risks.

Characteristics of primary care EHR data

Making effective use of EHR data in economic evaluations requires appreciation of some of the nuances of clinically recorded data and this may require working with clinicians and epidemiologists familiar with the clinical settings in which the data are recorded. Many studies have shown that primary care EHRs have high predictive value for capturing clinical diagnoses [17] but fewer studies have evaluated false-positive and false-negative diagnosis rates. Dregan et al. [7] compared primary care EHR data with cancer registrations for 42,556 patient with cancer diagnoses, finding that primary care EHRs had more than 99% specificity for a cancer diagnosis with sensitivity ranging from 85 to 94% across four different cancer sites. Errors of recording were found to be more frequent in a study of acute myocardial infarction, in which 15% of disease registry cases and 17% of hospital-recorded cases were not recorded into primary care records [18]. A further difficulty is the lack of detail incorporated in the selection of codes by family physicians. In a 2009 study of stroke, many stroke events were coded with non-specific codes that did not distinguish haemorrhagic strokes from cerebral infarcts [19]. In chronic illnesses such as stroke, recurrent events may be difficult to ascertain because later occurrences of diagnostic codes may sometimes refer to the index event [20]. Diagnostic problems are also found when patients present with acute illnesses. For example, a significant proportion of antibiotic prescriptions may be issued without a clear diagnosis being recorded [21]. Data recording and coding practices may also vary substantially between practices, between practitioners within practices, and over time at the same practice [22].

Prescriptions issued by a general practice are coded into the EHR using standard product dictionaries. This enables each prescription to be linked with a unique product price, enabling costing of patient-level drug utilization patterns over time [4]. Drug dosage and duration of treatment may not always be clearly established from EHR data but our experience suggests that most prescriptions for chronic illness can be assumed to last for 90 days [23]. Some drug prescriptions may not be recorded at the patient’s general practice. These include antibiotic prescriptions issued by out-of-hours services, or biologic therapies and chemotherapy treatments prescribed from hospital settings. At present, secondary care prescribing is not routinely available from UK EHRs but relevant elements are sometimes captured by specialist registers and linked secondary data.

Primary care EHRs include data concerning the results of clinical and laboratory tests and records of lifestyle behaviors. Analysis of these data can provide aggregate results that are consistent with estimates from national survey data for variables including smoking [24] and blood pressure [25, 26]. Test results are not regularly recorded over time as might be expected in a clinical trial; instead there is risk of confounding by indication because physicians order tests when there is a reason to do so. In a study of hospital electronic records, the presence of a test result and the timing of a test result in a patient’s record were strongly associated with survival, independent of any information about the value of the test result [27]. Consequently, missing values in a patient’s record will often result from a ‘not at random’ missingness mechanism. Optimal analytical approaches for EHRs will therefore consider both data generating mechanisms and the quality of underlying data; appropriate methods to achieve this are only beginning to be developed [28].

Conclusion

EHRs are a rich source of data, offering large sample sizes for underlying population-based epidemiological estimates at a low cost with an acceptable level of generalizability. They allow for research questions to be posed and answered with fewer assumptions through modeling studies that might not otherwise be performed, particularly in vulnerable subgroups difficult to study in a controlled trial setting.