An Empirical Analysis of Predictors for Workload Estimation in Healthcare
- 747 Downloads
The limited availability of resources makes the resource allocation strategy a pivotal aspect for every clinical department. Allocation is usually done on the basis of a workload estimation, which is performed by human experts. Experts have to dedicate a significant amount of time to the workload estimation, and the usefulness of estimations depends on the expert’s ability to understand very different conditions and situations. Machine learning-based predictors can help in reduce the burden on human experts, and can provide some guarantees at least in terms of repeatability of the delivered performance. However, it is unclear how good their estimations would be, compared to those of experts.
In this paper we address this question by exploiting 6 algorithms for estimating the workload of future activities of a real-world department. Results suggest that this is a promising avenue for future investigations aimed to optimising the use of resources of clinical departments.
KeywordsWorkload estimation Machine learning Predictors
in countries where the economy is developing, the increase is due to the improvement of services overtime.
In the so-called first-world countries, the growing life expectancy and the low birth rate are already increasing the pressure on the healthcare (see for instance ).
Remarkably, the problem faced in developed countries envisages a scenario where optimising the available resources will be a mandatory way to increase the efficiency of the healthcare system, and to optimise delivered services.
There are many different aspects and perspectives that can be subject to resource optimisation: optimisation can focus on different levels of the organisational charts, can focus on geographical clusters, can be tuned for the type of delivered services or clinical domains, and can address both administrative and clinical issues. Examples of approaches aimed at optimising the use of resources include a dynamic appointment scheduling system to cope with no-show patients and appointments deletion ; a scheduler for Radiology Departments ; and a chemotherapy appointment scheduling model under uncertainty . There is a growing interest in optimisation approaches, thanks to the potentially large benefits that their application would result in for an hospital or a clinical department.
Notably, most of the existing optimisation approaches deal with the allocation of resources, as soon as appointment requests are received or an estimation of future workload has been performed. In a sense, this is a kind of reactive optimisation. Intuitively, optimising resources on the basis of estimated workload can lead to better resource optimisation, due to the fact that there is no need to wait for actual appointments to be made. This would allow a shift from reactive to pro-active otpimisation. However, this kind of pro-active optimisation is very sensitive to the quality of predictions that are provided. Despite being pivotal for the allocation and exploitation of available resources, the workload estimation is still mostly performed manually by human experts, that have to devote a usually significant amount of their time to perform such task. Moreover, the usefulness of estimations depends on the expert’s ability to understand very different conditions and situations, and is very hard to verify. In fact, the same expert can provide both very accurate and very inaccurate estimations, undermining the subsequent allocation processes.
Machine learning-based predictors may help to overcome some of the aforementioned issues. In particular, their use can reduce the burden on experts, and provide some general guarantees on the quality of the predictions. Furthermore, machine learning can be used to quickly generate multiple scenarios, that can then be compared by experts to select the most appropriate. However, in order to understand the usefulness of well-known machine learning approaches for this task, it is mandatory to assess their ability in estimating future workloads in real-world circumstances.
In order to address the above issue, in this paper we present the results of a large empirical analysis aimed at comparing the performance in workload estimation of a number of algorithms on real-world data obtained from a Centre of study on Thyroid. To minimise the risk of providing results that are only specific for the case taken into account, we trained the considered algorithms on a restricted set of information, commonly available on the vast majority of Electronic Health Records (EHR) or appointment booking systems.
The remainder of this paper is organised as follows. First, we describe material and methods of the performed analysis. Then, in Sect. 3 we present results and a discussion. Next, we provide the conclusion of this paper and we envisage future steps.
2 Materials and Methods
(i) oncological examinations: ambulatory visits aimed at staging the Thyroid neoplasm, to assess the progression during the treatment or follow-up visits;
(ii) Non-oncological examinations: ambulatory visits for generic consultations for specific non-oncologic diseases such as hypo or hyper-thyroidism (e.g. due to physiological ageing or more specific reasons, such as the Basedow diseases).
(iii) Free triiodothyronine (fT3), a thyroid hormone. This analysis only requires a blood sample; for this reason it tends to be relatively cheap to perform and it is commonly prescribed.
(iv) Free thyroxine (fT4), a thyroid hormone similar to fT3. It can be analysed as the fT3 and, together, they are primarily responsible for regulation of metabolism.
(v) Parathyroid hormone (PTH), an hormone secreted by the parathyroid glands with a relevant role in the regulation of the serum calcium.
(vi) Thyroglobulin (Tg), a protein produced and consumed within the Thyroid.
(vii) Other common laboratory exams, such as complete blood count, cholesterol, etc.
(viii) Thyroid ultrasound investigation,
(ix) Fine Needle Aspiration Cytology (FNAC): the aspiration of some thyroid cells with a fine needle guided via ultrasound. Due to the invasive nature of the procedure, it require specific clinical skills and can be considered the most demanding event.
We decided to focus on this level of granularity because, also as a result of discussions with human experts of the considered medical field, these are key events with regards to human resources of a department (e.g. FNAC, Ultrasound, Medical examinations) or with regards to lab time and costs (e.g. fT3, fT4, Tg, PTH). Furthermore, those events are commonly recorded in EHRs, and would therefore provide a general ground to exploit workload estimation predictors in different units or departments.
Other clinical variables, such as co-morbidities, drugs or biomarkers was not considered: such kind of data are not always present in the EHR and when present are often represented without any specific reference to a shared ontology. For this reason, even if their inclusion had increased the performances of the predictions, it would also had reduced the reproducibility.
We considered a total of 5, 941 patients treated by the thyroid centre, which lead to 42, 839 events. The available data has been processed as follows. For each of the 9 clinical events analysed in this study, and considering all the patients involved in the event, we divided the logs in two parts, corresponding to an observation time window of at least 18 months before and 18 months after. The predicting task is to estimate the number of events that will occur in the 18 months after the event, given information about the 18 months before. It should be noted that a different predictor is built for each of the 9 events, and such predictor is only used to predict the number of future occurrences of such event. We then trained and tested the predictors exploiting a cross-validation jackknife approach, where \(90\%\) of the available data is used for training purposes, and the remaining \(10\%\) for testing predictors.
The 18-months time window reflects, to some extent, the nature of the treatments performed in the considered centre. This represents the common follow-up time, and includes a prudential margin to allow enough informative content for the prediction of the following 18. Of course, for different departments, this value can be straightforwardly adapted.
For the sake of this experimental analysis, we considered six well-known algorithms for building predictors, spreading from naive approaches –exploited as baselines– to widely-exploited Machine Learning techniques.
Mean: considering the entire training set it calculates the density of each kind of event during the time (how many, on average, per month) and use this density to predict how many events are expected in the future.
TipOver: each prediction is simply made by replicating the past recorded events. More specifically, for each patient, the kind and number of events of the next x months are exactly the same of the previous x months.
k-nearest neighbours algorithm (kNN) : uses the neighbourhood of the 8 most similar clinical cases and uses them to estimate the future, exploiting the mean of events occurred in the past 18 months. The metric is built on an n dimensional space where n is the number of kind of events. In this way, any patient can be seen as a point and the euclidean distance is used to select the neighbourhood. The axes are normalised between 0 and 1 to avoid overweighting the most frequent events.
Generalised linear model (lm) : uses generalised linear regression to estimates the next 18 months, adopting all the entire training data set.
Random Forest (rf) : Random forests are a combination of predictors such that each predictor is randomly generated, and all the predictorsa have the same weight. We built a Random Forest-based models using 500 random trees;
Support-vector machine (svm) : Support Vector Machines-based models the exploits a Gaussian kernel to perform the prediction.
2.2 Domain Expert
Dispersion is also to take into account, as a high value indicates that the corresponding predictor’s performance can vary greatly according to the considered circumstances. The solid horizontal (red) line represents the performance of the human expert. In this case, we could not show any dispersion value, as the expert made only a limited number of estimations, due to the complexity of the task when performed manually.
The results presented in Fig. 1 indicate that the machine learning-based predictors tend to estimate better than the very basics mean and tipOver approaches. However, even such naive approaches can deliver good performance in a couple of cases, indicating that the corresponding events are trivially easy to predict, given a suitable amount of available information. Notably, in some cases the mean approach is able to deliver prediction that outperform human experts: it can indeed be the case that even such naive approaches can be useful in supporting humans, by clearly highlighting regular patterns that would otherwise be hard to identify. On the other hand, more sophisticated ML approaches tend to consistently deliver better performance also on more complex cases.
In most of the considered cases, the performance of the human expert are impressive, even though ML-based techniques can still help in reducing mistakes and improving predictions. Noteworthy, the human expert has been making workload estimations for the considered centre for more than 20 years. Therefore, it is safe to assume that the delivered predicting performance is a very accurate representation of the best performance that can be achieved by a human. Further, the workload estimation task is very time consuming, and the results can significantly vary according to the experience of the human expert. The more experienced the expert is, the best are expected to be the predictions: however, there is also to factor in that fact that more experienced humans are extremely valuable resources that should spend their precious time on more critical tasks. Given this perspective, ML-based approaches can deliver generally good performance for estimating the workload for all the considered clinical events, and are extremely quick.
Interestingly, there is not a single algorithm that is able to outperform all the others in all the considered prediction tasks. On the one hand, this suggests that the clinical events we focused on are suitable for empirically comparing approaches as they pose very different challenges to predictors. On the other hand, results also point to the fact that an ensemble predictor may best suit the needs of a clinical department. An ensemble approach where a different predictor is trained for each event may therefore deliver robust and reliable performance.
Workload estimation is pivotal for optimising the use of resources in modern hospital departments. However, despite its importance, this task is mostly performed by human experts. Experts require a significant amount of time for performing this task, and results are highly dependent on the experience of the human. In this paper, we investigated the use of machine learning approaches for efficiently performing this tedious yet pivotal task.
The experimental analysis we performed demonstrates that it is possible to exploit machine learning-based predictors to accurately estimate workload of a clinical department, in terms of occurrences of a number of personnel or lab/cost intensive clinical events. In other words, human experts can be relieved by the burden of performing such time-consuming task: this has significant implications in terms of optimisation. Firstly, senior experts will have more available time to dedicate to more relevant matters. Secondly, quick and accurate ML-based predictions can be used as input to schedule-optimiser, in order to optimise the allocation of resources via more robust and better informed scheduling.
We see several avenues for future work. Firstly, we are interested in investigating the use of ensemble-based approaches for maximising the predicting performance of a wide range of clinical events. Secondly, we plan to extend our analysis to different departments, in order to evaluate how general the presented results are. Thirdly, we are interested in evaluating whether sharing information between departments of different hospitals can help improving the performance of predictors, by leveraging on privacy-preserving approaches .Finally, we will focus on approaches aimed at integrating the strengths of machine learning with the capabilities of human experts, possibly using an overarching framework that encompasses all the relevant steps of the process .
- 7.Dieleman, J.L., et al.: National spending on health by source for 184 countries between 2013 and 2040. Lancet 387(10037), 2521–2535 (2016)Google Scholar
- 9.Gatta, R., et al.: On the efficient allocation of diagnostic activities in modern imaging departments. In: Pereira, F., Machado, P., Costa, E., Cardoso, A. (eds.) EPIA 2015. LNCS (LNAI), vol. 9273, pp. 103–109. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23485-4_10CrossRefGoogle Scholar
- 10.Guzman-Castillo, M., et al.: Forecasted trends in disability and life expectancy in england and wales up to 2025: a modelling study. Lancet 2(1), e307–e313 (2017)Google Scholar