The management of sepsis is a highly complex, multifaceted challenge that remains the realm of highly skilled and trained human experts. But as medical applications of artificial intelligence continue to pour in, it is becoming obvious that some of these decisions could soon be left to machines that could be dubbed “intelligent”, improving clinical practice and patient outcomes [1]. Indeed, most of the tasks involved in the clinical management of sepsis (early recognition, selection of antibiotic therapy, haemodynamic optimisation, etc.) could be individually performed or optimised by dedicated algorithms.

Most of what we call “artificial intelligence” is in fact machine learning—a set of computer tools intended to generate new knowledge from data [1]. Machine learning includes three categories of techniques: supervised (which uses labelled data to build a prediction model, for example for prognostication), unsupervised (which discovers patterns in data and generates clusters of subjects that share common characteristics) and reinforcement learning (where a sequential decision process is modelled and optimised).

Below, I have selected a few significant applications that I consider the most likely to land in the clinical environment in the near future, either because of their robustness or their potential.

Notable applications of artificial intelligence in sepsis

First of all, it appears possible for automated algorithms to identify patients at risk of having sepsis, either in real time (“sepsis detection”) or in advance (“sepsis prediction”) [2]. This can be achieved with a range of supervised learning algorithms trained on a dataset containing negative and positive instances of sepsis. For example, a model based on gradient tree boosting showed good accuracy for predicting sepsis and septic shock using only vital signs several hours before onset [3]. Even simpler rule-based algorithms are capable of highlighting at-risk patients, for example by detecting end-organ damage and the non-specific Systemic Inflammatory Response Syndrome [4], quick Sequential Organ Failure Assessment (qSOFA) or SOFA scores [5]. Remarkably, the small study by Shimabukuro and colleagues was one of the few to demonstrate improved outcomes with an algorithm from a randomised trial, which is an important lesson: successful deployment and acceptance by clinicians are much more important than the intrinsic complexity of an algorithm [4].

Next, unsupervised learning is a collection of tools used to identify subgroups of patients with sepsis, arguably a highly generic syndrome. Various teams have proposed to apply clustering algorithms to separate patients based on their clinical, biological or omic data [6, 7]. While most of this research remains exploratory and hypothesis generating, it now appears possible to envision practical use. Antcliffe used transcriptomic data to show that a subgroup of patients had poorer outcomes when given steroids, which may find clinical application in the future [7].

The cornerstone of the haemodynamic management of sepsis remains to this day the optimisation of the circulating blood volume with intravenous fluids and/or vasopressors. Correct timing and dosing of these medications are highly challenging, and it appears that reinforcement learning could help with this difficult task [8]. In reinforcement learning, the disease process of septic patients is modelled into multiple health states, then the decisions taken by human clinicians are analysed and their value quantified [8]. Finally, decisions more likely to lead to improved organ function and/or survival are identified. A different approach could use complex causal inference methods to identify the best time for starting vasopressors, by quantifying the relative risk of death for different timing options [9].

Assessing the value of new decisions in a purely retrospective fashion is challenging: when the algorithm and clinicians disagree, it is difficult to reliably estimate what would have happened to the patient. Retrospective validation relies on expert assessment of the model behaviour, sensitivity analyses to test its robustness and a type of statistical methods called “off-policy evaluation” to estimate the value of the algorithm’s decisions, all of which have limitations [8, 10].

What next? The route to clinical deployment

The examples listed above have some potential for improving the outcomes of septic patients, but prospective validation in the clinical environment is widely lacking (the “byte to bedside” gap), so evidence of efficacy and safety in the real world remain scarce [1, 11]. The path to regulation (e.g. C.E. marking) is long and complex and beyond the scope of this paper, but regulatory bodies are getting up to speed and several products are now cleared for clinical use [1]. In parallel, the question of the legal framework is deferred by leaving the responsibility of the decision to clinicians.

Let us outline what the route to market could look like. First, an algorithm needs to be externally validated, which poses the non-trivial challenges of accessing and compatibilising different datasets. Then, prospective testing could be conducted in “silent” mode, by off-duty clinicians without informing patient care. The next step would be a randomised controlled trial comparing physicians alone to physicians assisted by the algorithm. Besides clinical benefit, the inventors will have to demonstrate that their tool provides interpretability, confidence intervals in its estimates and can detect deviations from a required behaviour. Provided it does, this should warrant sufficient evidence for approval by the regulatory bodies.

A parallel can be drawn between the traditional drug development pathway by pharmaceutical companies and the burgeoning field of “medical artificial intelligence” (Fig. 1). The process of bringing a new medication to market is long and difficult, with most ideas being discarded along the way. Similarly, we can expect most published algorithms to never impact clinical care, and indeed very few systems are currently approved for clinical use [1]. What we need, if nothing else, is more algorithms, more teams testing them in a systematic, safe and controlled fashion, and more support from funding and regulatory agencies.

Fig. 1
figure 1

Comparing the development process of drugs and artificial intelligence-based clinical tools. The process of bringing a new drug to market is long and complex, takes over 10 years and costs several billions. Starting from a large pool of potential drug candidates, the set of molecules are narrowed down through several steps of pre-clinical and clinical testing. Similarly, the nascent field of AI in healthcare is at the phase of producing many potential algorithms, very few of which will be approved for clinical use and deployed at scale. The number of drug and algorithm candidates, as well as the timescale is provided for illustration purposes only

Once deployed, these systems will be available to clinicians as real-time bedside tools, for example, accelerating sepsis recognition or suggesting a dose range of intravenous fluids. If the evidence of their superiority is strong enough, we can even anticipate that the use of these systems might become enforced, similar to what exists for example in the aviation industry with automated landing systems [12].

To conclude, the future is bright for the field of artificial intelligence in medicine and applications in sepsis are nearing clinical deployment. If we manage to produce and test enough candidate systems, then success will become unavoidable and we will see the day where patients with sepsis are optimally managed by tandems of algorithms and human clinicians working hand in hand.