1 Introduction

Real-time asset condition data from embedded sensors, microprocessors, and communication technologies have extensively automated industrial operations over the recent decades. Asset failure prognosis in particular has evolved from relying on physics-based formulations to data-driven machine learning (ML) algorithms. Data-driven prognosis is significantly more flexible in terms of evaluating real-time asset reliabilities and the diversity of failures that can be predicted [5, 10, 22].

Prognosis also forms the bedrock for servitisation, saving production costs and times for the manufacturing industries by enabling real-time asset health analyses. Manufacturers selling servitised contracts are responsible for the asset uptime and their associated maintenance costs [28]. This is made possible by predictive maintenances policies characterised by optimal procurement of spare parts and labour, thus saving significant production costs for the service providers [11, 21]. Data-driven prognosis techniques enable the manufacturers to predictively plan the maintenance activities for imminent failures, compared to the traditional preventive and corrective maintenance policies relying on a fixed maintenance schedule/ plan of action.

Data-driven prognosis involves evaluating either the time until asset failure, or the probability of the asset’s failure in a fixed time window into the future [34]. Regression or classification models are used for predictions in respective cases. A predictive model is usually trained using time series of observed asset condition data. Such a time series is known as a trajectory, where the asset condition is monitored since its healthy state [30]. Predictive models are trained using the asset condition parameters at each time-step as the features, and their remaining times before failures (if observed) as the required outputs. Comprehensive information about the algorithms used for optimising model parameters, feature extraction techniques, and evaluation measures can be found in [34, 40].

1.1 Weibull time-to-event recurrent neural networks

In this context, Weibull time-to-event recurrent neural networks (WTTE-RNN) is a simple and versatile algorithm for discrete event predictions. It is simple enough to avoid data sparsity, and at the same time provides the operators with the necessary information for maintenance planning such as increasing, decreasing, or stationary risk. WTTE-RNN combines survival theory with recurrent neural networks, in order to generate a failure density curve of a given asset in real time. It involves optimising the parameters of a Weibull log-likelihood loss function describing the probability of occurrence of an event for the future time horizon of a given asset. As such, WTTE-RNN generates a failure density curve that enables the maintenance managers to make a risk-based maintenance decision depending on the variance of the density curve. A higher variance indicates a less confident prediction and vice versa. The Weibull function can be used either for discrete or continuous events, but the current application of industrial prognosis is treated as discrete event prediction.

The parameters of the Weibull loss function are optimised using a recurrent neural network (RNN). RNNs are Turing-complete and can learn complex temporal patterns, corresponding to the typical characteristics of industrial failure data such as nonlinearity, noise, and time dependency [27, 29, 30]. The caveats of using recurrent neural networks, however, are that they require more computational resources than other regression methods and that they have a tendency towards overfitting [35]. The first problem was handled in the experiments discussed here via parallel processing across the CPU cores, whereas over-fitting was controlled by early-stopping and a network with few free parameters.

Another significant advantage of WTTE-RNN is its ability to use both censored and uncensored failure data for learning the failure statistics of the concerned fleet [27, 29]. Often the industrial condition data originates from the assets that do not fail and is often unusable by off-the-shelf data-driven prognosis algorithms [23]. However, WTTE-RNN is able to incorporate the condition of not failing until the current time-step in its loss function, discussed later in Sect. 3, and therefore account for the uncensored data in the fleet by flattening the failure density function until that time-step.

WTTE-RNN was first presented as a consumer churn prediction algorithm or a discrete event prediction in general in the Masters thesis [27]. Comparing with a standard binary fixed window prediction algorithm, [27] concluded that promising results were achieved with varying degrees of censoring and temporal resolution while using the run-to-failure data of simulated turbofans [27]. Consequently, WTTE-RNN enjoyed immediate popularity in the applications for discrete event predictions, mostly in the medical domain, for example, in [8] for hypoglycemic event prediction based on data collected by wearable devices and in [20] for breast cancer-related data. Several other applications in the medical prognosis field exist such as in [4, 18, 39]. WTTE-RNN also finds applications in telecom industry, such as in [33] which compared a multilayered perceptron (MLP) with WTTE-RNN algorithm for customer churn prediction in the telecom industry. However, [33] claimed based on their experimental results that the model performance of MLP was better than the WTTE-RNN, achieving a higher AUC, precision, and recall. In another application for change point detection in the industrial systems data, [17] compare the OC-SVM technique with the WTTE-RNN and conclude that WTTE-RNN eliminates the need for depending on the choice of the training data and the hyperparameters of the network, making it a good model in the absence of labelled data for cross-validation.

The first industrial prognosis applications for predicting failures in a fleet of simulated turbofans features in [29] and [30]. Both papers [29, 30] achieved performance improvement for WTTE-RNN via the technique of collaborative prognosis, which aims at identifying the sub-fleets of similarly operating assets and enabling learning across them by selectively pooling the training datasets. [26] presented an extensive analysis of using WTTE-RNN for prognosis of simulated turbofan failures by comparing WTTE-RNN with other standard prognosis techniques and concluded that WTTE-RNN was the most flexible and accurate. Other applications of WTTE-RNN for failure prediction include [1, 15, 16].

The literature also presents several instances where the variants of WTTE-RNN have been developed to make it better suitable for a specific application, such as WTTE-CNN-LSTM in [38]. [38] presents a variant of WTTE-RNN where a convolutional neural network is first used for data extraction, followed by using a long short-term memory network instead of an RNN. Based on a case study for failure prediction of a construction machinery component and theoretical foundation, [38] propose that the WTTE-CNN-LSTM model has a higher prediction accuracy. [2] presented experiments to identify an optimal neural network architecture of WTTE-RNN for machine failure prognosis. In another thesis [26], the author proposes several techniques and considerations for improving the performance of WTTE-RNN especially for the industrial prognosis applications. Lastly, [3] compares several algorithms for discrete events prediction, focusing on churn prediction, and say that the deep learning-based techniques such as WTTE-RNN show promising performance hoping to see the increasing reliability of such techniques in the future.

As such, the authors deem WTTE-RNN a highly suitable algorithm for industrial prognosis, for its ability to support risk-informed decision-making and to account for the censored data while generating the failure statistics of the fleet.

1.2 Histogram data for industrial prognosis

Asset condition data are often recorded by the industries as histograms, due to their memory efficiency and homogeneity across the variables [13, 24]. Formally, the histogram data used for experiments in this paper are categorical histogram data, where a frequency is assigned to each bin [7]. Histograms are different than the often expected scalar input of the ML algorithms and must systematically be preprocessed to extract the information. On a wider scale, complex data structures different from the commonly encountered numeric and categorical variable types are studied under the field of symbolic data analysis [7]. But to the best of authors knowledge, literature presents only few instances which specifically target prognosis using histogram data.

These include the works of [9, 32], and [12,13,14] who investigated compressor, battery failures, and NOx sensor failures in heavy-duty trucks, respectively. [9] did not clearly outline the preprocessing steps while using the histogram data, and the study of [32] was limited by the small fleet size used for analysis. The closest of the above three works to the case study discussed in this paper is that of [12,13,14] who used a dataset very similar to the one used here, but for a different prognosis technique and target component. The authors in [12,13,14] used random forest algorithm for classifying the data corresponding to failure and non-failure class. However, their technique relied on recursively combining the histogram bins using a sliding window, in contrast to the entropy of the histogram distributions used in this paper. The technique is detailed in later sections, but it essentially relies on evaluating the relative entropies of the consecutive histograms to quantify the deterioration in reduced dimension.

This paper presents the first industrial use case of WTTE-RNN for industrial prognosis. The idea was to predict turbocharger failures in a fleet of heavy-duty trucks owned by Scania CV (referred as Scania in the following text), so that maintenance activities can be planned in real time. There are a number of truck components that do not have an associated maintenance model with them; hence, at the moment Scania maintains them in a run until failure setting. The drivers can sometimes sense the symptoms of imminent failure and avoid such failures but in many cases failures lead to high costs; for late goods delivery, re-loading, and towing tucks to workshop. Lately, Scania is facing an increased demand on operational availability from many customers. Accurate prognosis algorithms, like the WTTE-RNN algorithm, lower the customer costs and improve the uptimes of Scania trucks. WTTE-RNN also improves the situations at workshops as more work shifts from unplanned workshop visits to planned workshop visits. Since the condition data used in the case study were recorded as a time series of sparsely sampled histograms, a technique to preprocess such data is also shown.

In particular, the objectives of the paper are to (1) evaluate the efficacy of WTTE-RNN for prognosis using an industrial dataset and (2) present a technique for preprocessing sparsely sampled histogram data for prognosis.

The following paper is structured as: Section 2 explains the truck-fleet data and its preprocessing steps. The WTTE-RNN algorithm is briefly discussed in Sect. 3, where the Weibull discrete loss function that allows the model to be trained on censored and uncensored failure data is mathematically described. Section 4 presents the experimental cases to extensively analyse the efficacy of WTTE-RNN. Results obtained from the experiments are presented and discussed in Sect. 5. Finally, important conclusions are summarised in Section 6.

2 Dataset description

This section discusses the structure and preprocessing steps for the turbochargers health data obtained from the Scania truck fleet. But first, the physical properties of the turbocharger are briefly discussed in the initial paragraphs.

Turbocharger is a modern improvement in the internal combustion engines, which increases the density of air going into the combustion chamber of the engines. Increased density of the inlet air improves the power output and efficiency compared to a naturally aspirated engine. A turbocharger is powered by the engine exhaust, further information about which can be found in [37]. Turbocharger are critical for a truck’s operations, failure of which results in unplanned vehicle downtime and increased service costs for the maintenance provider.

Turbochargers health in the Scania trucks fleet were monitored by recording internal and environmental operating parameters such as ambient temperature and pressure, axle loads, boost air pressure over the course of the trucks’ usage. Each parameter for a given truck was recorded as a histogram, such that the range of possible values for a given parameter was binned by separating it into intervals. The values falling in each of these bins represented the time that truck spent operating in that interval since it was commissioned. Whenever the conditions for sharing the data were met, the values corresponding to the histogram bins were recorded as a snapshot of data. These represented the time spent by the truck in each of those temperature ranges since it was commissioned.

Over the course of observation, a temporal evolution of the histograms could be obtained and analysed for failure prognosis. However, snapshots of these values were recorded at irregular intervals depending on the truck age, data contract with the customers, network capability, etc. This led to irregular and sometimes sparsely sampled time-series data. The number of samples (snapshots of the histograms) per truck ranged from 10 to 30.

The data used for the case study presented herewith were collected from trucks that were monitored for a fixed period of time since they commenced their operations. Within this period, certain trucks experienced turbocharger failures while the rest didn’t. As a result, prognosis here is a survival analysis problem where we have a population of trucks observed over time, and the goal is to predict the time until a new truck fails, given its turbocharger health data.

The fleet data were also typical due to the customisation offered by Scania. Customised trucks are best suitable for their end use, but at the same time diversify the fleet, thereby making turbocharger prognosis challenging. The specifications were categorical and did not evolve over time, for example the type of exhaust manifold, trailer connection, braking system, etc. Apart from the histogram and categorical types, certain parameters were also recorded as plain scalars. The combined set of all histogram, categorical, and scalar parameters constituted the overall operating conditions, and a representative sample of it is shown in Table 1. The Date_recorded column in Table 1 represent scaled date, the purpose of which is to illustrate the irregularity in the sampled data. Along the table headings, the Cat_ columns denote the categorical variables, Scalar_ column denotes the scalar variable, and the H1_b columns denote the histograms where the histogram and the corresponding bins are indexed. A single row from this table is further shown schematically in Fig. 1 using a histogram representation. Preprocessing this operating condition data as the input features for WTTE-RNN is detailed in Sect. 2.1.

Table 1 Representative example of the dataset containing different types of features in tabular format
Fig. 1
figure 1

Schematic description of a set of features recorded as histograms. The values in the array of recorded data correspond to the heights of the histograms, which in turn are the time spent by the truck in the corresponding bins

2.1 Preprocessing the dataset

The steps followed while preprocessing the Scania dataset are explained here and summarised as a flowchart in Fig. 2. Illustrations describing the corresponding changes in the raw data are shown wherever necessary in the flowchart.

As a first step, the trucks with a sampling frequency of at least 10 snapshots in 200 time-steps before the end of study were selected from the overall dataset. Out of all the trucks present in the fleet, 14% trucks encountered turbocharger failures. Next, the features relevant for predicting turbocharger failures were selected from all the parameters monitored by the sensors. Table 2 describes the total histogram, categorical, and scalar features that were deemed useful for turbocharger prognosis, along with the examples corresponding to each feature type. These were selected based on a combination of expert knowledge and their prevalence across the fleet. Number of bins per each of the histogram variables were 8 (4), 10 (3), 12 (3), 18 (2), and 19 (2), where the numbers in the parentheses denote the number of operating parameters with the corresponding number of bins mentioned outside.

The condition data also comprised of features that were co-recorded as matrices, which are essentially two-dimensional histograms with either feature on each of its axes. For example, a matrix of axle load vs. speed is a two-dimensional histogram, which records the time the truck spends in a given speed/ load combination. If a possible range of load is divided into 5 bins and that of speed into 8 bins, then the total bins in a matrix would be \(5 \times 8 = 40\) bins. An example of such a matrix is shown in Fig. 2. Matrices were converted into their constituent histograms by adding the elements across the axes, as illustrated in Fig. 2.

Because the histograms stored the time spent by the truck since its commissioning, it was important to subtract a given snapshot of bin values from the previous one. This way the histograms represented the truck’s operation between consecutive snapshots. Therefore, the next preprocessing step involved evaluating differences between subsequent snapshots of the bin values (i.e. \(\sum _{i = 1}^{n}[H^t_i - H^{t-1}_i]\)) and normalising them (i.e. \(H^t_i = \frac{H^t_i - H^t_\mathrm{max}}{H^t_\mathrm{max} - H^t_\mathrm{min}}\)) where \(H^t_i\) is the \(i^\mathrm{th}\) bin of the histogram consisting of n bins and recorded at time-step t. \(H^t_\mathrm{max}\) and \(H^t_\mathrm{min}\) are the maximum and minimum bin values recorded for the histogram, respectively. An example of varying bin values over a course of the truck’s usage is shown in Fig. 2 where it is clear how the distribution changes as the truck ages and the performance depreciates over time. The categorical features were binary-encoded, and the scalar features were min-max normalised.

However, the number of features still needed to be reduced, because (1) a neural network would not be able to distinguish between bins belonging to separate features and (2) the total features if each bin was an input to the neural network were more than 300 causing the problem of high-dimensional input. The second problem commonly known as the curse of dimensionality leading to overfitting is explained extensively in [36].

Fig. 2
figure 2

Flowchart describing the steps followed while processing the Scania dataset for the case study

To address the above challenges and compress the information contained in the histogram, the overlap between the consecutive histograms was evaluated in the next step. The overlap, which is formally called the relative entropy of the two distributions, served as a measure for deviation in the truck’s condition compared to the previous snapshot and therefore also the drop in its performance. The overlap was evaluated by adding the minimum among the corresponding bin values of the subsequent normalised histograms. Mathematically, this was evaluated as \(\sum _{i=1}^{N}\mathrm{min}(A_i^{t-1},A_i^{t})\) where A is the histogram feature containing N bins, i is the bin index, and t is the indicator of the timestamp. Similar overlap difference was also evaluated with the very first histogram (representing the truck’s healthiest state) and added as another column of input.

After the above step, every histogram was reduced to only two features, i.e. the overlaps evaluated with the antecedent histogram and the first histogram representing the truck’s healthiest state. Total features reduced to \( 1/30\,{\text{th}} \) of the original count, and the design matrix was a 3-D matrix of the shape (fleet size \(\times \) Features \(\times \) Max RUL) where Max RUL was the maximum trajectory length (\(=\) number of snapshots) among the trucks that failed in the study. The trucks with a trajectory shorter than max RUL were masked with an arbitrary and constant value to complete the matrix. The masked values were ignored while training the model. This is the input format required by the Keras RNN library, more information about which can be found in [19].

Table 2 Description of the histogram, categorical, and scalar features selected for turbocharger prognosis

3 WTTE-RNN algorithm

A brief description of the WTTE-RNN algorithm used here for prognosis is provided in this section. The WTTE-RNN algorithm combines techniques from survival analysis and recurrent neural networks. A bespoke log-likelihood loss function is used to train a recurrent neural network to provide the two parameters of a Weibull probability distribution of the time to event, given a vector of asset health data. The reader is referred to [27] for the comprehensive details. The application of WTTE-RNN is the same as in [29], but with a differently processed condition data from an industrial fleet of trucks.

The proposed log-likelihood function to be maximised by the recurrent neural network is:

$$\begin{aligned}&\mathrm{log}(L) = \sum _{n=1}^{N}\sum _{t=0}^{T_n} \bigg [ u_t^{n}log[Pr(Y_t^n = y_t^n|x_{0:t}^{n})] \nonumber \\&\quad +(1-u_t^n)\mathrm{log}[Pr(Y_t^n > y_t^n|x_{0:t}^n)] \bigg ] \end{aligned}$$
(1)

where \(u_t^n\) indicates whether the observation at time t is censored (we have not yet observed the real failure time, \(u^n_t = 0\)). The first probability to appear is \(u_t^{n}\mathrm{log}[Pr(Y_t^n = y_t^n|x_{0:t}^{n})]\), which means in case that we have observed the real time to failure (\(u^n_t = 1\), uncensored), maximise the probability of our predicted time to failure \(Y_t^n\) being equal to the real time to failure \(y_t^n\) given the known values of the time-series of preprocessed histograms before time t, \(x^n_{0:t}\). The second term, \((1-u_t^n)\mathrm{log}[Pr(Y_t^n > y_t^n|x_{0:t}^n)]\) means: if we have not yet observed the real time to failure (\(u^n_t = 0\), censored), maximise instead the probability of the predicted time to failure \(Y_t^n\) being bigger than time left until the time at which we know that there has been no failure yet \((y_t^n)\). \(\sum _{n=1}^{N}\sum _{t=0}^{T_n}\) account for the summation over all the recorded failure trajectories (N) and over all the time-steps of each trajectory \((T_n)\). The probabilities appearing in (1) can be obtained by the means of survival analysis, in essence for the discrete case it translates to:

$$\begin{aligned} \mathrm{log}(L) = u*\mathrm{log}\big (e^{d(t)} - 1\big ) - \varLambda (t + 1) \end{aligned}$$
(2)

where \(\varLambda (t)\) is known as the cumulative hazard function and \(d(t) = \varLambda (t + 1) - \varLambda (t)\) is the step cumulative hazard function.

If one assumes that the failure probability follows a Weibull distribution, the discrete log-likelihood (added over all trajectories and all time-steps) can be shown to be:

$$ {\text{log}}(L_{d} ){\text{ }} = \sum\limits_{{n = 1}}^{N} {\sum\limits_{{t = 0}}^{{T_{n} }} ( } u_{t}^{n} {\text{log}}[exp[(\frac{{y_{t}^{n} + 1}}{{\alpha _{t}^{n} }})^{{\beta _{t}^{n} }} - (\frac{{y_{t}^{n} }}{{\alpha _{t}^{n} }})^{{\beta _{t}^{n} }} ] - 1]{\text{ }} - (\frac{{y_{t}^{n} + 1}}{{\alpha _{t}^{n} }}))^{{\beta _{t}^{n} }} $$
(3)

where \(\alpha _t^n\), \(\beta _t^n\) are the parameters of the Weibull distribution. The unconstrained optimisation problem to be solved by the recurrent neural network can be then summarised in finding the weights w that maximise \(\mathrm{log}(L_d)\).

In summary, the outputs after the RNN optimises the loss function in (3) are the \(\alpha \) and \(\beta \) values for the Weibull distribution describing the truck’s failure density distribution into the future.

This distribution is parameterised as \(f(t) = \frac{\beta }{\alpha }\big (\frac{t}{\alpha }\big )^{\beta - 1}exp\big [-\big (\frac{t}{\alpha }\big )^{\beta }\big ]\) where f(t) is the probability of failure at time t. For the results presented here, the mode of this distribution was used as the time to failure for a given truck. Once a new snapshot of histograms was obtained, the distribution was re-evaluated.

An example of prediction is shown in Fig. 3 where the data used for test prediction for a single truck is indicated, along with the predicted distribution of failure probability. Fig. 4 shows the true and predicted times to failures for a single truck throughout its lifetime.

Fig. 3
figure 3

Example showing the predicted failure probability distribution, where mode corresponds to the predicted time to failure

Fig. 4
figure 4

True and predicted times to failures for a single truck randomly chosen from testing dataset, where the x-axis indicates the truck’s trajectory of data recorded since its healthiest state

4 Experimental cases

The experiments in the case study were aimed at evaluating the efficacy of the WTTE-RNN algorithm for training the prediction model with datasets without censored data and a mix of uncensored and censored data.

The experiments with uncensored data further involved exploring the effects of clustering the trucks based on their technical similarities. Such clustering of assets comes off as an intuitively systematic method to train a prognosis model and is also supported in the literature on several instances. Researchers have shown improved prognosis performance with both static and dynamic clustering of the assets, especially for collaborative or transfer learning applications [6, 25, 29,30,31]. It is reasoned that clustering the assets based on their operational similarities makes the training dataset homogeneous, thus improving the prediction performance. However, clustering also reduces the size of the training dataset. A training dataset with fewer trucks leads to overfitting especially when using a model with several free parameters such as RNN. The trade-off between having a statistically homogeneous dataset vs. a larger training dataset is demonstrated here, for the uneven distribution of trucks comprising the clusters.

Figure 5a presents the distribution of trucks that encountered a turbocharger failure (these are the trucks comprising the uncensored training dataset) across the clusters. The x-axes in Fig. 5 denote the combinations of technical specifications present in the fleet, where each letter denotes a specific type of specification. Overall, five clusters were present in the fleet, and proportion of trucks in the fleet present in each cluster are shown along the y-axis. A similar distribution of the trucks that did not encounter a turbocharger failure (these are the trucks comprising the censored training dataset) is shown in Fig. 5b. The axes in Fig. 5a and b are not numbered to protect the details of the industrial dataset used for this case study.

Fig. 5
figure 5

Bar graphs showing the distributions of failed and non-failed trucks across clusters

4.1 Case 1: only uncensored data used for training

In the first case, data from the trucks that encountered a turbocharger failure during the study were only used for the analysis. The training dataset was divided into clusters of trucks containing same specifications. Following which, the prediction performance of WTTE-RNN was compared for the cases where it was trained using individual cluster data only (cluster specific model), versus when trained using overall fleet data (general model). The comparison was made based on testing datasets from the corresponding individual clusters.

Data from every cluster were split in 70–30 ratio for training and testing purposes. The cluster-specific models were trained using data from corresponding clusters only, and the training data from all clusters were accumulated to train the general model. It was ensured that the testing datasets were consistent while comparing the cluster-specific and general models. Figure 6 schematically describes the training and testing datasets corresponding to each test case.

Fig. 6
figure 6

Schematic description of the training and testing datasets used for the experiments with uncensored training data only

The same testing datasets were also used for evaluating the performance of the case where a mixture of censored and uncensored data were used for training. This is explained in the following section.

4.2 Case 2: a mix of censored and uncensored data used for training

The goal of this experiment case was to analyse the effect of using censored data in addition to the uncensored data on the prognosis performance. A mixture of censored and uncensored training datasets were therefore used for training the algorithm. The data from the trucks that did not fail were used, in addition to the training data used in the first case. Only cluster-specific models were compared in this case.

5 Experimental results

This section presents and discusses the results obtained from the experimental cases explained in Sect. 4.

5.1 Performance evaluation

Concisely evaluating and presenting the performance across the testing dataset was challenging due to the uneven sampling/ time gaps between the subsequent snapshots of data across the trucks (and also for a single truck). To address this issue, the shortest trajectory in the testing dataset was divided into ten equal time-segments. The average difference between true and predicted time at every snapshot within the segments was evaluated, and the values across the testing dataset were plotted as boxes.

This process of segmenting and evaluating the performance is shown in Fig. 7. Using the shortest trajectory ensured that the number of trucks within each time segment were constant, and the company experts approved that the time was more than sufficient for planning the maintenance activities.

Fig. 7
figure 7

Schematic representation of segmenting the time series into equal-time segments to evaluate algorithm performance

5.2 Results and discussion

This section presents and discusses the results obtained from the experiments, and using the evaluation measure described above.

The results obtained from the uncensored training dataset experiments are shown in Fig. 8, with the corresponding clusters mentioned in the captions.

  1. 1.

    The box plots represented by Clusters: ‘Y’ show the performances of the cluster-specific models, and the general model performance is indicated by Clusters: ‘N’. Different time segments since the \( 120\,{\text{th}} \) time-step till failure are shown along the x-axes, where the average true - predicted errors were calculated for each truck in the testing dataset. A segment was 12 time-steps long. Y-axes of the plots represents the calculated errors in the same time units.

  2. 2.

    It is observed in Fig. 8 that the interquartile ranges of the errors of cluster-specific models shown in subfigure a, b, and c lie within the range of ± 25 time-steps. The company experts believe this provides them with sufficient time to plan the maintenance measures and mitigate the imminent failures.

  3. 3.

    The plots in Fig. 8a, b, and c also advocate the reason behind clustering that it systematically homogenises the training dataset. This benefit is observed as the mean and variance of the evaluated errors are near zero and much lesser compared to the general models, respectively.

  4. 4.

    Nevertheless, clustering is not visibly advantageous in the plots of subfigures d and e. This difference in the benefit from clustering is attributed to lesser training dataset for the latter clusters. The trucks in the latter two clusters are nearly one-third of the trucks comprising either of the former three clusters. However, the cluster-specific model performs similar to the general model in terms of error medians and variances.

  5. 5.

    Moreover, the errors (of the cluster-specific models) are initially negative, but trend towards being positive as the trucks approach the point of failure. This means that WTTE-RNN under-predicts the failures, thus providing the maintenance managers with a conservative estimate-further reducing the adversity of the failure even when the predictions are not as accurate.

Fig. 8
figure 8

Comparison of cluster-specific and general model for uncensored training data case, the clusters mentioned in sub-captions

Similar plots presenting the results from the experiments containing a mixture of uncensored and censored training data are shown in Fig. 9. But unlike Fig. 8, the performances of only cluster-specific models are presented here. This is because it is concluded from Fig. 8, discussed in the above points, that the cluster-specific model performs better than the general model. The corresponding cases are mentioned in the captions, and the performance was evaluated using the same testing dataset as in Fig. 8. The following salient features can be observed in Fig. 9:

  1. 1.

    Here again, the interquartile error ranges and errors of the predictions in Fig. 9a and c are within actionable ranges, for the maintenance managers to plan the mitigation measures. The plot in Fig. 9 d is ambiguous due to the presence of little testing data compared to the training data when the censored data was included. The poor performance of the algorithm in Fig. 9b and e is due to extremely high imbalance of censored and uncensored training data. This is further discussed in the following points.

  2. 2.

    From Fig. 9 a and c, the performance of WTTE-RNN is observed to improve, especially in terms of reduced error variance, compared to Fig. 8a and c where only uncensored data was used for training. The error variance in the time segment just before failure is nearly halved when a combination of censored and uncensored data is used for training WTTE-RNN. This is due to the fact that the algorithm is now able to make better informed estimates of \(\alpha \) and \(\beta \) Weibull parameters.

  3. 3.

    However, such an improvement is not observed for Fig. 9b and e due to the highly imbalanced censored and uncensored training datasets. In both cases, the censored training data is 15 times the uncensored training data. This causes over-generalisation of the Weibull loss function which is trained such that the trucks fail with a very low probability.

Fig. 9
figure 9

True-predicted times to failure error shown for a mixture of uncensored and censored data

To highlight the reason in the last point above, an extension of this experiment was conducted where the proportion of uncensored/censored data was systematically decreased in steps of 4 : 1, 1 : 1, and 1 : 15 for the clusters shown in Fig. 9 b and e. The errors for these are plotted in similar format in Fig. 10.

It is seen in Fig. 10 that the prediction performance improves (especially in the subfigure Fig. 10b until a certain proportion of the censored data and then deteriorates as the proportion of censored data is further increased.

Fig. 10
figure 10

Extension of the experiment corresponding to Fig.  9b and e, where the proportion of uncensored data for training was systematically decreased, as indicated in the plots

6 Conclusions

This paper presents the first industrial use case of WTTE-RNN for prognosis. Since the condition data used in the case study were recorded as a time series of sparsely sampled histograms, a technique to preprocess such data is also shown.

The case study shown involves real-world turbocharger failures dataset obtained from a fleet of trucks monitored over a fixed period of time since they commenced their operations. The performance of WTTE-RNN is analysed in the experiments for cases where uncensored, and a mixture of uncensored and censored data are used for prognosis of turbocharger failures.

The following key conclusions are deduced based on the experiments discussed in this paper:

  1. 1.

    The high dimensionality of histogram data can be reduced by evaluating the relative entropies (like presented here, or by using KL-divergence measures) of consecutive snapshots of data, and of the current histogram with that corresponding to the healthy asset. This substantially reduces the dimensions while retaining the information necessary for prognosis.

  2. 2.

    The above point is presented as remarkable performance of WTTE-RNN shown in Figs. 8 and 9 where the error variances and means lie within ranges that enable the maintenance managers to plan mitigation measures in real time.

  3. 3.

    Clustering is not necessarily beneficial for training a prognosis algorithm. This is because clustering reduces the size of the training dataset, sometimes to an extent that a model with several free parameters can overfit. This is represented in Fig. 8d and e where clustering shows no substantial benefit compared to the general fleet model.

  4. 4.

    Censored data from the assets that did not fail are often ignored by the industries. But the results in Fig. 9 show that incorporating censored data while training a prognosis model can improve the predictions. This can be obtained with the Weibull loss function of the WTTE-RNN algorithm.

  5. 5.

    However, care must be taken while incorporating censored data for training because if the training dataset is highly imbalanced the algorithm performance deteriorates drastically, as shown in Fig. 10.