Sepsis is a life-threatening organ dysfunction caused by dysregulated host responses to infection due to a variety of microorganisms [1]. The dysregulated immune responses lead to an uncontrolled systemic inflammatory response, with resultant tissue and multi-organ dysfunction [2]. Most cases develop sepsis outside the hospital (community-acquired sepsis) while some develop sepsis while hospitalized, often related to invasive devices, procedures, or operations (hospital-acquired sepsis). Although definitions vary between studies, hospital-acquired sepsis cases represent about 10.1% to 53.0% of all sepsis cases [3,4,5,6,7,8].

The National Health Service (NHS) and UK Health Security Agency (UKHSA) are committed to tackling health inequalities [9]. A recent literature review by our group reported on risk factors for sepsis that were associated with health inequalities [10]. We found that socioeconomic factors associated with increased sepsis incidence included lower socioeconomic status and lower education level. However, findings were not consistent across studies and most of the studies were conducted in the USA. For ethnicity, mixed results were reported [10]. Also, there are only a few studies in literature that have evaluated the incidence and predictors of community-acquired sepsis. The purpose of the study was to measure the association of specific exposures (deprivation, ethnicity, and clinical characteristics) with incident sepsis and case fatality. The approach in this study was data driven without prior hypotheses of specific predictor effects (we refer to predictors as exposure that are associated with the outcome sepsis without implication of causality).

Materials and methods


Data sources were the Clinical Practice Research Databank (CPRD) GOLD [11] and CPRD Aurum [12] that contain longitudinal, anonymized, patient-level electronic health records (EHRs) from general practices in the UK. Almost all UK residents are registered with a general practice, which typically provides almost all primary healthcare. If a patient received emergency care (e.g., at Accident and Emergency), inpatient or outpatient hospital care, the general practice of the patient will typically be informed. All UK general practices use EHRs which are provided by several different EHR vendors, including EMIS and Vision. EMIS is the most frequently used primary care EHR [13]. The CPRD GOLD databases includes general practices that use Vision EHR software system, while CPRD Aurum practices use EMIS Web. CPRD GOLD included data on about 11.3 million patients [11] and CPRD Aurum included data on 19 million patients [12]. These databases include the clinical diagnoses, medication prescribed, vaccination history, diagnoses, lifestyle information, clinical referrals, as well as patient’s age, sex, ethnicity, smoking history, and body mass index (BMI). Patient-level data from the general practices were linked to Hospital Episode Statistics (HES), which is a database containing details about hospital admissions. The medical charts with longitudinal information collected during a hospital admission are reviewed and coded using the ICD-10 dictionary by the hospital and clinical codes and dates provided to the HES database. Patient records were also linked to small area deprivation information using socioeconomic information from Index of Multiple Deprivation (IMD) based on the patient’s residential postcode [14]. Patient-level IMD was aggregated into quintiles for the current analysis.

Study population

This study focused on community-acquired sepsis (most frequent) given the difference in etiology with hospital-acquired sepsis, and on patients aged 65–100 given their higher rates of sepsis. This study was done simultaneously with a study with similar objectives but that used a different English data source [15].

The overall study population consisted of patients aged 65–100 years at any time during the observation period (from January 1, 2000, to July 1, 2020, for CPRD GOLD or up to September 1, 2020, for CPRD Aurum) and who were registered at a GP practice in England. The lower age limit was related to inclusion criteria in the approved protocol; the upper age limit was selected based on the challenges in matching very elderly patients. The practices were restricted to those that contributed to CPRD GOLD or CPRD Aurum and that participated in record linkage. Patient information included sex, age, ethnicity, and medical history. Follow-up of individual patients was defined from the earliest of: (a) their start date of registration with a general practice, (b) prior duration of the patient’s registration in the practice of at least 1 year, or (c) time of reaching age 65 years, until the earliest of: (a) end date due to patients leaving practice, (b) death or (c) time of reaching 101 years of age.

A case–control methodology was selected to measure association between individual exposures and sepsis. Cases were patients who had a hospital record with a sepsis diagnosis (based on the ICD10 codes in HES for sepsis A40 and A41). Only incident cases (i.e., the first sepsis record) were included into study. Each case was randomly matched with up to six controls who had not been hospitalized in the year before. The matching was done by age, sex, calendar time (stepwise by same calendar year and quarter of year, calendar year and then within 5 years), and level of clinical coding in a practice. For each practice, the mean level of coding of clinical information was assessed for each general practice (details are provided elsewhere [16]). Sepsis cases were stratified into community- and hospital-acquired cases. Community-acquired cases were defined as those with a sepsis record within 2 days of the date of hospital admission; hospital-acquired were those that occurred more than 2 days after the hospital admission. Patients were classified at 3-monthly period into four frailty groups (based on the Qfrailty classification). This was based on the Qmortality score [17] (predicting risk of all-cause mortality) in conjunction with the Qadmissions score [18]. Qfrailty was categorized as severe, moderate, minor or non-frailty. The most recent record for frailty prior to the index date was used. Body mass index (BMI), smoking history, and history of 60 clinical conditions prior to the index date were also measured (using code lists from different sources including [19]). Antibiotic exposure in the 2 months before the index date was also measured (as indicator of GP-diagnosed presence of infection).

All-cause mortality outcome for the sepsis cases in the 30 days after the date of sepsis hospital record was assessed using linked death certificates (i.e., case fatality rates). To explore the effects of the age and sex matching on the discrimination between sepsis cases and controls of the logistic models, a second case–control data were also created by only matching cases to controls by calendar time and level of clinical coding in a practice.

The analyses of associations of specific exposures (deprivation, ethnicity, and clinical characteristics) on risk of developing sepsis were conducted in two separate parts. The first one focused on deprivation, ethnicity, frailty, BMI, smoking history, and prior antibiotic exposure. The second one focused on the 60 clinical characteristics. The reason for analyzing the clinical characteristics separately from, e.g., deprivation was that possible causal pathways could be bi-directional (e.g., deprivation could lead to higher incidence of diabetes mellitus but also diabetes could lead to deprivation). With such possible complex causal pathways, adjustment for variables is not preferred statistically.

Statistical analysis

The matching for age was based on a propensity matching procedure using a caliper (pre-specified maximum difference) of 0.25 of the logit of the propensity score [20]. Greedy nearest neighbor matching was used to select the control unit nearest to each treated unit. Patients were only included once in the analysis. The SAS procedure PSMATCH was used to conduct the matching.

Conditional logistic regression models analyzed the overall effects of individual exposures. Odds ratios (ORs) and 95% confidence intervals (95% CIs) were estimated. Crude ORs assessed the effects of an individual predictor in developing sepsis in matched cases and controls. Adjusted ORs estimated the effects adjusted for other predictors. Random forest (RF) models assessed the relative importance of the 60 clinical characteristics and antibiotic exposure in discriminating between cases and controls; these models predict the probabilities (RF scores) of being a case or control. RF is a supervised tree-based classifier developed by Breiman [21]. Tree-based methods such as RF offer superior performance for sub-group classification over techniques such as logistic regression due to its difficulty to a-priori define the subgroups [22]. A recent study used RF models to identify the medicine combinations associated with higher risks of adverse drug-related hospital admission [23]. The RF models estimated the variable importance index (also known as Gini index) which ranks the explanatory (independent) variables in importance in the tree classifications. We pragmatically selected the maximums of number of trees of 500, depth of 50, and leaf node of 25. Sensitivity analyses were conducted doubling the number of trees and doubling the leaf node. The RF scores were divided into decile groups in the fourth analysis (ranging from low to high risk of developing sepsis) and the distribution of the deprivation, ethnicity, and frailty assessed across these deciles.

Case fatality rates (i.e., 30-days all-cause mortality) were analyzed in unconditional logistic regression models. Crude models evaluated the effects of individual exposures, and adjusted models included all exposures as analyzed for case fatality.


In the matching process, 99.3% of the sepsis patients were matched to at least one control. Of the matched cases, 94.1% were matched to six controls and 0.2% to one control. 45.1% of the cases were hospitalized during the calendar years 2015–2020. Table 1 shows the characteristics of matched sepsis cases and controls. Cases and controls were well matched on age and sex. The mean age was 80.6 years for cases and 80.4 for controls. For sex, the percentage of women was 50.8% in cases and 51.4% in controls (this small difference in the percentages was related to varying ratios of controls to each case between men and women). Of the 119,529 cases, 108,317 (90.6%) were classified as community-acquired sepsis.

Table 1 Characteristics of matched sepsis cases and controls

As shown in Table 2, severe frailty was strongly associated with the risk of developing community-acquired sepsis (crude OR 14.93; 95% CI 14.37–15.52). The most deprived patients (with deprivation measured by IMD) also showed an increased risk of community-acquired sepsis (crude OR 1.48; 95% CI 1.45–1.51). Non-white races showed lower risks of developing sepsis (crude OR 0.92 in Black people; 95% CI 0.86–0.97). 34.1% of the community-acquired sepsis cases and 11.0% of the controls received an antibiotic in the 2 months before. The presence of infections (as measured by antibiotic exposure in prior two months) was also strongly associated to the risk of community-acquired sepsis (crude OR 4.43; 95% CI 4.36–4.50).

Table 2 Crude odds ratios of sepsis by ethnicity, deprivation, frailty, BMI, smoking history, and antibiotic prescribing in prior 2 months stratified by type of sepsis

Of the 60 clinical characteristics evaluated, strong predictors for community-acquired sepsis included chronic hepatitis (crude OR 2.89; 95% CI 2.72–3.08), being housebound (crude OR 2.66; 95% CI 2.62–2.70), and learning disability (crude OR 3.02; 95% CI 2.68–3.40) (Table 3). Table 4 shows the distribution of frailty, ethnicity, and deprivation by deciles of RF scores (for community-acquired sepsis). The range of predictions by the RF model of being a case ranged from 5.9% in the lowest RF decile to 58.6% in the highest decile. Severe frailty was more prevalent in the highest deciles of the RF score (21.5% in highest decile versus 0% in lowest decile). Deprivation was strongly associated with higher RF probabilities for developing community-acquired sepsis. A logistic model with RF scores as predictors found a c-statistic of 0.788 in the discrimination between sepsis cases and controls.

Table 3 Crude odds ratios of developing sepsis for 60 clinical characteristics stratified by sepsis type
Table 4 Distribution of frailty, ethnicity, and deprivation by deciles of random forest scores for community-acquired sepsis

All-cause mortality within 30 days was found to be high in community-acquired sepsis cases (Table 5). Severely frail patients had a case fatality rate of 42.0% while non-frail patients had a rate of 24.0% (crude OR 2.30; 95% CI 2.17–2.43, adjusted OR 1.53; 95% CI 1.41–1.65). Sepsis cases with antibiotic exposure in the prior 2 months were less likely to die compared to sepsis not using antibiotics (crude OR 0.71; 95% CI 0.70–0.73, adjusted OR 0.74; 95% CI 0.72–0.76). Case fatality rates strongly decreased over calendar time. The adjusted OR for a yearly change in sepsis mortality was 0.94 (95% CI 0.94–0.95).

Table 5 Crude odds ratios of all-cause mortality within 30 days after hospital admission for community-acquired sepsis for age, sex, calendar time, deprivation, ethnicity, frailty, and antibiotic exposure in prior 2 months


This study found that severe frailty was strongly associated with the risk of developing sepsis. The most deprived patients also showed a 48% increased sepsis risk. Other strong predictors for developing sepsis included antibiotic exposure in prior 2 months, being house bound, having cancer, a skin ulcer, or diabetes mellitus. Fatality rates of sepsis were high and much higher in severely frail patients compared to non-frail patients. Sepsis cases with recent prior antibiotic exposure were less likely to die compared to non-users. Case fatality strongly decreased over calendar time.

There are several limitations in this study. The first was that the sepsis diagnosis was based on coded data (as done by each hospital at discharge or death within the hospital without clinical details of severity or the specific criteria supporting the evidence of the sepsis diagnosis). The diagnosis criteria for sepsis have also changed over the last 2 decades and this study could not apply the latest criteria for sepsis diagnosis. However, sensitivity analyses showed only small effects of the ORs of sepsis with ethnicity, deprivation, and frailty. Also, coding quality may vary between hospitals [24], although it is likely that any misclassification may be random and non-differential leading to underestimates of associations. Another limitation was that this study used broad categories for ethnicity and deprivation, while these characteristics involve heterogenous patient groups with diverse drivers for the incidence of sepsis. The study was observational, and patients could not be randomized between different categories, so we could not separate between direct causal effects of, e.g., ethnicity and indirect effects through higher prevalence of causal factors in these groups. This study assessed the calibration of logistic models. As this analysis was based on a case–control study, the results cannot be generalized to performance in the general population as the rate of the outcome sepsis is very different in a population compared to a case–control setting.

Most published studies on sepsis were hospital-based with limited data on prior medical history and without population-based controls. No studies on community-acquired sepsis were conducted in the UK with the exception of our recent study that used OpenSAFELY and included all ages and covered recent calendar time [15]. In this study of about 250,000 sepsis cases (about 80% were community-acquired), similar results were found. Socioeconomic deprivation and comorbidity were associated with an increased odds of developing non-COVID-19-related sepsis and 30-day mortality in England [15]. With respect to deprivation, four population-based studies on sepsis incidence were found in the literature (other than our recent OpenSAFELY study). All reported increased rates of sepsis incidence with deprivation [25,26,27,28]. Two of these studies did not differentiate between hospital- and community-acquired sepsis, which often have different causes and predictors. The two other studies did evaluate community-acquired sepsis, although they included only about 3500 sepsis cases [27, 28]. A prospective cohort with 30,000 US participants also reported a risk prediction model for the development of community-acquired sepsis. It included a smaller number of clinical risk factors such as chronic lung disease, peripheral artery disease, diabetes, stroke, atrial fibrillation, coronary artery disease, hypertension, and deep vein thrombosis [29]. The strength of the present study is that it included a large number of clinical risk factors for a large number of sepsis cases. There is an urgent need to improve our understanding of risk factors for community-acquired sepsis (which in this study involved about 90% of all sepsis cases). As outlined by Kempker et al. sepsis could be viewed as a preventable challenge that can be addressed with population and system-based solutions, including management of risk, factors, appropriate and risk-proportionate antibiotic usage, public awareness, hygiene, and immunization [30].

The National Institute for Health and Care Excellence (NICE) in England has developed a guideline for the recognition, diagnosis, and early management of sepsis [31]. It individually lists patient groups at higher risk of developing sepsis. Most of these are chronic risk factors (such as elderly age, impaired immunity) or include those that affect a substantive number of patients (such as diabetes or other comorbidities). The challenge is that the pathogenesis of sepsis is rapid, and interventions need to be targeted to early triggers of deterioration. A recent review looked at studies of sepsis triggers and tools to support better recognition in healthcare settings. Only 17.7% of identified studies concerned pre-hospital settings [32] and most of those concerned screening by paramedics [33]. Furthermore, some existing tools, such as the Modified Early Warning System (MEWS) [34], Robson criteria [35], Simple Sepsis Early Prognostic Score [36], and a machine learning model [37], mostly concern physiological measurements to support earlier recognition of acute decline. Another widely used tool is NEWS-2 which uses routinely recorded physiological measurements, already recorded in routine practice [38]. However, this tool has not been validated in primary care settings [39]. While these tools focus on early recognition of sepsis in hospital setting [40], there is a lack of monitoring tools that have been tested and can be used at home by patients at high risk of developing sepsis to facilitate earlier contact with the healthcare system. Remote patient monitoring has been used in patients with COVID-19 for early identification of deterioration [41].

The implication of this study is that there is a need for prediction models for risk of developing sepsis that can help to target preventative antibiotic therapy. Important predictors included frailty, deprivation, people with learning difficulties and conditions such as diabetes mellitus and being house bound. The finding of frailty being a major predictor for development of sepsis suggests that interactions between different conditions likely impact the risk of sepsis. The most important predictor in our risk stratification, as expected, was an indicator of infection (antibiotic use in prior two months). Thus, there is a need for developing risk prediction models that consider not only chronic diseases but also, importantly, the acute early triggers and details on infection severity.

In conclusion, the development of community-acquired sepsis is strongly associated with socioeconomic deprivation and some clinical characteristics. Strong predictors of sepsis included recent prior antibiotic exposure, frailty, and conditions such as diabetes mellitus and being house bound. Case fatality rates of community-acquired sepsis were high, particularly in severely frail patients. Given the variety of predictors and their level of associations for developing sepsis, there is a need for prediction models for risk of developing sepsis that can help to target preventative antibiotic therapy.