Introduction

The science of outcome prediction is particularly useful in the setting of the emergency room – the entry point of many to acute care. In this issue of Critical Care Jaimes and coworkers [1] evaluate the usefulness of artificial neural networks (ANNs) in predicting hospital mortality in patients presenting to the emergency room with suspected sepsis. The construction of a prediction tool is a difficult undertaking; it requires careful methodological consideration and validation before the predictions can be deemed valid and reliable in naïve patients [2, 3]. These tools identify the presence of associations between the outcome of interest and empiric risk factors that contribute to this outcome. A well designed tool will typically possess three qualities: discrimination (the ability to identify accurately those patients who will reach the outcome from those who will not), goodness of fit (the ability to match accurately predicted and actual outcomes, such as mortality rate, in subgroups of patients), and the ability to achieve these predictions in cohorts of patients similar to those in which the tool was developed [4, 5].

Artificial neural networks as prediction tools

Most predictive tool use logistic regression – a well vetted statistical technique that is applicable to situations in which the outcome is binary (e.g. survival/death), measured at a predetermined time in the future [6]. The technique can precisely quantify the relative contribution of each risk factor to outcome, typically crystallized as the odds ratio (i.e. the relative odds that a patient with the risk factor has of reaching the outcome as compared with a patient without the risk factor). ANNs represent an alternative technique for achieving predictions. The key difference between the two techniques is that the contribution of each risk factor is not as rigidly dictated with ANNs as it is in a logistic regression model. ANNs can improve predictions by extracting information drawn from unforeseen interactions between predictors. Arguably, if a modeler had foresight of the important interactions present, then they could design an accurate standard logistic regression model without the heavy price associated with the use of an ANN.

Often considered as 'black boxes', ANNs suffer from several shortcomings. There is no clear prescription to construct an ANN. Off the shelf software is readily available, but there is little guidance as to how ANN parameters, such as number of intermediate (hidden) layers of neurons, number of neurons in those layers, learning rate, activation functions, and several other tuning parameters, should be chosen and tuned for optimal performance by a novice user. ANNs, by virtue of the presence of numerous weights linking the neurons, can accommodate nonlinearity but they also include very large numbers of parameters. There are no clear techniques that provide limits of confidence for those parameters. Consequently, the relative contribution of each input risk factor to the outcome is difficult to quantify. Because of the large number of parameters, it is very easy for an ANN to overfit the development dataset. In other words, the predictions offered by the ANN will be overly optimistic in the original population of patients, and this good performance will not generalize well to populations to which the ANN is naïve.

Jaimes and coworkers [1] do not offer a prediction in a naïve population, nor do those investigators clearly indicate whether techniques were applied to minimize the risk of overfitting. Thus, their conclusion may be overstated. Limited measures guarding against overfitting can be applied within a development set, without specific recourse to an independent validation set. Finally, preparing input data for an ANN requires some amount of a priori knowledge. For example, white blood cell count or fever typically have a 'U'-shaped relationship to outcome. Both low and high values portend a bad outcome, whereas intermediate values are normal. The predictions of an ANN will be significantly improved if this unintuitive relationship between risk factor and outcome is already 'known' to the modeler ahead of time.

ANNs have been used by many investigators to predict outcomes in the critically ill [711]. Clearly, ANNs – as prediction tools – require significant expertise to build, have significant shortcomings, and should be developed and validated as rigorously as logistic regression models [12]. ANNs do offer greater flexibility and may allow the identification of unforeseen interactions. ANNs can predict continuous outcomes, and thus offer an alternative to multivariable regression. Similarly, ANNs are easily scalable to ordinal or categorical outcomes, and to survival analysis [13], which is much less familiar territory for the critical care physician. However, ANNs can easily be misused, typically because their limitations are not well understood [14, 15].

Conclusion

Therefore, in settings in which familiar tools such as logistic regression can be applied, ANNs should be reserved for those situations where standard models do not perform well; where one suspects the presence of intense but poorly characterized interactions between risk factors; where one does not particularly care about quantifying the relative contributions of risk factors; where appropriate validation is possible; and, most importantly, where proper expertise is readily available.

The science of developing prediction models is best left to the expert, but the clinician can contribute invaluable content knowledge to the process.