System availability assessment using a parametric Bayesian approach: a case study of balling drums
Abstract
Assessment of system availability usually uses either an analytical (e.g., Markov/semiMarkov) or a simulation approach (e.g., Monte Carlo simulationbased). However, the former cannot handle complicated state changes and the latter is computationally expensive. Traditional Bayesian approaches may solve these problems; however, because of their computational difficulties, they are not widely applied. The recent proliferation of Markov Chain Monte Carlo (MCMC) approaches have led to the use of the Bayesian inference in a wide variety of fields. This study proposes a new approach to system availability assessment: a parametric Bayesian approach using MCMC, an approach that takes advantages of the analytical and simulation methods. By using this approach, mean time to failure (MTTF) and mean time to repair (MTTR) are treated as distributions instead of being “averaged”, which better reflects reality and compensates for the limitations of simulation data sample size. To demonstrate the approach, the paper considers a case study of a balling drum system in a mining company. In this system, MTTF and MTTR are determined in a Bayesian Weibull model and a Bayesian lognormal model respectively. The results show that the proposed approach can integrate the analytical and simulation methods to assess system availability and could be applied to other technical problems in asset management (e.g., other industries, other systems).
Keywords
Asset management System availability Reliability Maintainability Bayesian statistics Markov Chain Monte Carlo (MCMC) Mining industry1 Introduction
Availability represents the proportion of a system’s uptime out of the total time in service and is one of the most critical aspects of performance evaluation. Availability is commonly measured as Mean Time to Failure (MTTF) and Mean Time to Repair (MTTR). However, those “mean” values are normally “averaged”; thus, some useful information (e.g., trends, system complexity) may be neglected, and some problems may even be hidden.
Assessment of system availability has been studied from the design stage to the operational stage in various system configurations (e.g., in series, parallel, koutofn, standby, multistate, or mixed architectures). Approaches to assessing system availability mainly use either analytic or simulation techniques.
In general, analytic techniques represent the system using direct mathematical solutions from applied probability theory to make statements on various performance measures, such as the steadystate availability or the interval availability (Dekker and Groenendijk 1995; Ocnasu 2007). Researchers tend to use Markov models to assess dynamic availability or semiMarkov models using Laplace transforms to determine average performance measures (Dekker and Groenendijk 1995; FaghihRoohi et al. 2014). However, such approaches have been criticised as too restrictive to tackle practical problems; they assume constant failure and repair rates which is not likely to be the case in the real world (Raje et al. 2000; Marquez et al. 2005). Furthermore, the time dependent availability obtained by a Markovian assumption is actually not valid for nonMarkovian processes (Raje et al. 2000).
Simulation techniques estimate availability by simulating the actual process and random behaviour of the system. The advantage is that nonMarkov failures and repair processes can be modelled easily (Raje et al. 2000). Recent research is working on developing Monte Carlo techniques to model the behaviour of complex systems under realistic timedependent operational conditions (Marquez et al. 2005; Marquez and Iung 2007; Yasseri and Bahai 2018) or to model multistate systems with operational dependencies (Zio et al. 2007). Although simulation is more flexible, it is computationally expensive.
Traditionally, Bayesian approaches have been used to assess system availability as they can solve the problem of complicated system state changes and computationally expensive simulation data; however, their development and application were stalled by the strict assumptions on prior forms and by computational difficulties. Research is more concerned with the prior’s selection or the posterior’s computation than the reality (Brender 1968a, b; Kuo 1985; Sharma and Bhutani 1993; Khan and Islam 2012).
The recent proliferation of Markov Chain Monte Carlo (MCMC) simulation techniques has led to the use of the Bayesian inference in a wide variety of fields. Because of MCMC’s high dimensional numerical integral calculation (Lin 2014), the selection of prior information and descriptions of reliability/maintainability can be more flexible and more realistic.
This study proposes a new approach to system availability assessment: a parametric Bayesian approach with MCMC, with a focus on the operational stage, using both analytical and simulation methods. MTTF or MTTR are treated as distributions instead of being “averaged” by point estimation, and this is closer to reality; in addition, the limitations of simulation data sample size are addressed by using MCMC techniques.
The rest of this paper is organized as follows. Section 2 describes the problem statement, the balling drum system, the data preparation, and the preliminary analysis of failure and repair data. Section 3 proposes a Bayesian Weibull model for MTTF and a Bayesian lognormal model for MTTR and explains how to use an MCMC computational scheme to obtain the parameters’ posterior distributions. Section 4 presents a case study, results, and discussion. Section 5 offers conclusions and suggestions for further study.
2 Problem statement
This section presents the study problem statement, the balling drum system and its configuration, the system availability framework, and data preparation; it performs a preliminary analysis of failure and repair data based on which parametric Bayesian models are constructed subsequently.
2.1 Balling drum systems in the mining industry
2.2 Data preparation and preliminary analysis
The study uses the failure and repair data of the five balling drums from January 2013 to December 2018. There are 1782 records. In the first step, the null values are removed, and the data are reduced to 1774 records.
After checking the work order types of such kind of abnormal data, it is found that most of them are caused by “preventive maintenance” which may due to lack of maintenance resources. To simplify the study, we assume all maintenance resources are sufficient for “preventive maintenance”; thus, the abnormally data might be caused by shortage of spare parts or skilled personnel will not be treated specially in this paper.
Preliminary study of failure data and repair data
Balling drum  TTF fitness  TTR fitness  

1st  2nd  3rd  1st  2nd  3rd  
1  Weibull  Loglogistic  Lognormal  Lognormal  Weibull  Logistic 
2  Weibull  Loglogistic  Lognormal  Lognormal  Weibull  Logistic 
3  Weibull  Loglogistic  Lognormal  Lognormal  Weibull  Logistic 
4  Weibull  Loglogistic  Lognormal  Lognormal  Weibull  Logistic 
5  Weibull  Loglogistic  Lognormal  Lognormal  Weibull  Logistic 
Based on the results, the Weibull distribution and lognormal distribution are selected for the TTF and TTR for balling drums 1–5; these are applied to the parametric Bayesian models in the next section.
3 Parametric Bayesian Models
This section proposes a Bayesian Weibull model for TTF and a Bayesian lognormal model for TTR in the proposed parametric Bayesian models and explains the procedure of MCMC computational scheme to obtain the posterior distributions.
3.1 Markov Chain Monte Carlo with Gibbs sampling

Step 1. Choose an arbitrary starting point \(\theta^{\left( 0 \right)} = \left( {\theta_{1}^{\left( 0 \right)} , \ldots ,\theta_{k}^{\left( 0 \right)} } \right)\);

Step 2. Generate \(\theta_{1}^{\left( 1 \right)}\) from the conditional distribution \(f\left( {\theta_{1} \theta_{2}^{\left( 0 \right)} , \ldots ,\theta_{k}^{\left( 0 \right)} } \right)\), and generate \(\theta_{2}^{\left( 1 \right)}\) from the conditional distribution distribution \(f\left( {\theta_{2} \theta_{1}^{\left( 1 \right)} ,\theta_{3}^{\left( 0 \right)} , \ldots ,\theta_{k}^{\left( 0 \right)} } \right);\)

Step 3. Generate \(\theta_{j}^{\left( 1 \right)}\) from \(f\left( {\theta_{j} \theta_{1}^{\left( 1 \right)} , \ldots ,\theta_{j  1}^{\left( 1 \right)} ,\theta_{j + 1}^{\left( 1 \right)} \ldots ,\theta_{k}^{\left( 0 \right)} } \right)\);

Step 4. Generate \(\theta_{k}^{\left( 1 \right)}\) from \(f\left( {\theta_{k} \theta_{1}^{\left( 1 \right)} ,\theta_{2}^{\left( 1 \right)} , \ldots ,\theta_{k  1}^{\left( 1 \right)} } \right)\); the onestep transition from \(\theta^{\left( 0 \right)}\) to \(\theta^{\left( 1 \right)} = \left( {\theta_{1}^{\left( 1 \right)} , \ldots ,\theta_{k}^{\left( 1 \right)} } \right)\) has been completed, where \(\theta^{\left( 1 \right)}\) is a onetime accomplishment of a Markov chain.

Step 5. Go to Step2.
After \({\text{t}}\) iterations, \(\uptheta^{{\left( {\text{t}} \right)}} = \left( {\uptheta_{1}^{{\left( {\text{t}} \right)}} , \ldots ,\uptheta_{\text{k}}^{{\left( {\text{t}} \right)}} } \right)\) can be obtained. Each component of \(\uptheta\) can also be obtained. Starting from different \(\uptheta^{\left( 0 \right)}\), as \({\text{t}} \to \infty\), the marginal distribution of \(\uptheta^{{\left( {\text{t}} \right)}}\) can be viewed as a stationary distribution based on the theory of the ergodic average. Then, the chain is seen as converging, and the sampling points are seen as observations of the sample.
3.2 Bayesian Weibull model for TTF
Suppose the time to failure (TTF) data \({\text{t}} = \left( {{\text{t}}_{1} ,{\text{t}}_{2} , \ldots ,{\text{t}}_{\text{n}} } \right)^{\prime}\) for \({\text{n}}\) individuals are i.i.d, and each corresponds to a 2parameter Weibull distribution \({\text{W}}\left( {\upalpha,\upgamma} \right)\), where \(\upalpha > 0\) and \(\upgamma > 0\). Then, the p.d.f. is \({\text{f}}\left( {{\text{t}}_{\text{i}} \upalpha,\upgamma} \right) =\upalpha \upgamma {\text{t}}_{\text{i}}^{{{\upalpha}  1}} { \exp }\left( {  {\upgamma \text{t}}_{\text{i}}^{{\upalpha}} } \right)\), while the c.d.f. is \({\text{F}}\left( {{\text{t}}_{\text{i}} {\upalpha},{\upgamma}} \right) = 1  { \exp }\left( {  {\upgamma \text{t}}_{\text{i}}^{{\upalpha}} } \right)\). The reliability function is \({\text{R}}\left( {{\text{t}}_{\text{i}} {\upalpha},{\upgamma}} \right) = { \exp }\left( {  {\upgamma \text{t}}_{\text{i}}^{{\upalpha}} } \right)\).
3.3 Bayesian Lognormal model for TTR
4 Case study
This section presents a case study; it explains the procedure, gives the results, and offers a discussion.
4.1 The procedure
Steps in the system availability assessment
Steps  Name  Purpose  Outputs in this case 

1  Configuration definition  System configuration and dependencies determined to calculate system availability  Five balling drum system parallel and independent (see Sect. 2.1) 
2  Data collection  Reliability and maintenance data (and information) collected  1774 records for failure and repair data of the five balling drums collected from 2013 to 2018 (see Sect. 2.2) 
3  Data preparation  Data cleaned and outliers removed as needed  Null values removed and abnormal data checked (see Sect. 2.2) 
4  Preliminary Analysis  Prestudies for TTF and TTR data performed to decide the baseline distributions  MTTF fits a Weibull distribution; MTTR fits a lognormal distribution (see Sect. 2.2) 
5  Parametric Bayesian model building  Prior distribution defined, and analytic models developed  Bayesian Weibull model for MTTF with gamma priors and Bayesian lognormal model with gamma and normal priors constructed (see Sect. 3) 
6  MCMC simulation  Burnin defined and MCMC simulation implemented; convergence diagnostics and Monte Carlo error checked to confirm the effectiveness of the results  Burnin of 1000 samples used with an additional 10,000 Gibbs samples for each Markov chain (see Sects. 3 and 4.2) 
7  Results and analysis  Results, calculation, and discussion  Results for parameters of interest in system availability assessment (see Sects. 4.2 and 4.3) 
4.2 Results
In this case study, the calculations are implemented with WINBUGS. A threechain Markov chain is constructed for each MCMC simulation. A burnin of 1000 samples is used, with an additional 10,000 Gibbs samples for each Markov chain.
 For Bayesian Weibull model using TTF data:$$\alpha \sim G\left( {0.0001,0.0001} \right),\quad \gamma \sim G\left( {0.0001,0.0001} \right)$$
 For Bayesian lognormal model using TTR data:$$\mu \sim N\left( {0,0.0001} \right),\quad \sigma \sim G\left( {0.0001,0.0001} \right).$$
Posterior statistics in Bayesian Weibull model for TTF
Balling drum  Parameter  Mean  SD  MC error  95% HPD interval 

1  \(\alpha\)  0.5409  0.0231  4.288E−4  (0.4964, 0.5867) 
\(\gamma\)  0.0928  0.0120  2.235E−4  (0.0712, 0.1178)  
2  \(\alpha\)  0.5747  0.0288  6.289E−4  (0.5195, 0.6324) 
\(\gamma\)  0.0642  0.0109  2.334E−4  (0.0451, 0.0876)  
3  \(\alpha\)  0.5975  0.0251  5.004E−4  (0.5974, 0.6481) 
\(\gamma\)  0.0712  0.0098  1.942E−4  (0.0707, 0.0922)  
4  \(\alpha\)  0.5745  0.0245  4.885E−4  (0.5272, 0.6236) 
\(\gamma\)  0.0750  0.0104  2.028E−4  (0.0564, 0.0970)  
5  \(\alpha\)  0.5560  0.0216  4.135E−4  (0.5558, 0.5988) 
\(\gamma\)  0.0958  0.0112  2.158E−4  (0.0952, 0.1196) 
Posterior statistics in Bayesian lognormal model for TTR
Balling drum  Parameter  Mean  SD  MC error  95% HPD interval 

1  \(\mu\)  − 0.1842  0.1107  6.730E−4  (− 0.4015, 0.0342) 
\(\sigma\)  0.2270  0.0169  9.565E−5  (0.1951,0.2615)  
2  \(\mu\)  − 0.0075  0.1424  8.504E−4  (− 0.2845,0.2697) 
\(\sigma\)  0.1861  0.0161  9.140E−5  (0.1556, 0.2193)  
3  \(\mu\)  − 0.4574  0.1134  6.540E−4  (− 0.4578, − 0.2354) 
\(\sigma\)  0.2196  0.0164  9.621E−5  (0.2191, 0.2533)  
4  \(\mu\)  − 0.3540  0.1145  7.052E−4  (− 0.5787, − 0.1297) 
\(\sigma\)  0.2184  0.0166  9.845E−5  (0.1871, 0.2523)  
5  \(\mu\)  − 0.3484  0.1023  6.265E−4  (− 0.3486, − 0.1488) 
\(\sigma\)  0.2195  0.0148  8.614E−5  (0.2189, 0.2495) 
Statistics of individual availability
Balling drum  MTTF  MTTR  Availability  

Mean  95% HPD interval  Mean  95% HPD interval  Mean  95% HPD interval  
1  145.0  (118.1, 178.0)  7.779  (5.284, 11.58)  0.9487  (0.9229, 0.9665) 
2  196.4  (157.7, 256.0)  15.48  (8.927, 26.60)  0.9265  (0.8766, 0.9582) 
3  128.7  (127.9, 155.0)  6.381  (6.194, 9.622)  0.9525  (0.9538, 0.9693) 
4  148.5  (122.5, 180.3)  7.178  (4.755, 10.86)  0.9536  (0.9291, 0.9702) 
5  115.8  (115.1, 139.0)  7.083  (6.926, 10.22)  0.9420  (0.9433, 0.9610) 
4.3 Discussion
Equation (17) shows the flexibility of assessing availability according to reality. For one thing, the parametric Bayesian models using MCMC make the calculation of posteriors more feasible. More importantly, however, parametric Bayesian models can be applied to predict TTF, TTR, and system availability in the future.
In this study, since the five balling drums are relatively new, the gamma distributions and normal distributions are selected as vague priors due to lack of prior information. This could be improved with more historical data/experience.
The system configurations could be extended to other more complex architectures (series, koutofn, standby, multistate, or mixed) by modifying Eq. (2).
The data analysis reveals that for TTF data, the shape parameter for the Weibull distribution is less than 1. The TTFs have a decreasing trend (as in an early stage of the bathtub curve) which is not suitable for the experience of mechanical equipment. The TTF data include not only corrective maintenance but also preventive maintenance. In this case study, a high percentage of TTF work orders are for preventive maintenance. The decreasing trends also indicate that a possible way to improve TTF is to improve the preventive maintenance plan.
Among those three stages, Step 1 to Step 4 can be treated as Plan stage; Step 5 and Step 6 as Do and Check stage, while Step 7 as Action stage. The outputs from Step 7 could become input for Step 2 for the next calculation period. It means these eight steps are following the “PDCA” cycle and the results could be continuously improved.
5 Conclusions
This study proposes a parametric Bayesian approach for system availability assessment on the operational stage. MCMC is adopted to take advantages of the analytical and simulation methods.
In this approach, MTTF and MTTR are treated as distributions instead of being “averaged” by a point estimation. This better reflects the reality; in addition, the limitations of simulation data sample size are compensated for by MCMC techniques.
In the case study, TTF and TTR are determined using a Bayesian Weibull model and a Bayesian lognormal model. The results show that the proposed approach can integrate the analytical and simulation methods for system availability assessment and could be applied to other technical problems in asset management (e.g., other industries, other systems).
Notes
Acknowledgements
The motivation for the research originated from the project “Key Performance Indicators (KPI) for control and management of maintenance process through eMaintenance (In Swedish: Nyckeltal för styrning och uppföljning av underhållsverksamhet m h a eUnderhåll)”, which was initiated and financed by LKAB. The authors wish to thank Ramin Karim, Peter Olofsson, Mats Renfors, Sylvia Simma, Maria Rytty, Mikael From and Johan Enbak, for their support for this research in the form of funding and work hours.
References
 Brender DM (1968a) The Bayesian assessment of system availability: advanced applications and techniques. IEEE Trans Reliab 17(3):138–147CrossRefGoogle Scholar
 Brender DM (1968b) The prediction and measurement of system availability: a Bayesian treatment. IEEE Trans Reliab 17(3):127–138CrossRefGoogle Scholar
 Dekker R, Groenendijk W (1995) Availability assessment methods and their application in practice. Microelectron Reliab 35(9–10):1257–1274CrossRefGoogle Scholar
 FaghihRoohi S, Xie M, Ng KM, Yam RC (2014) Dynamic availability assessment and optimal component design of multistate weighted koutofn systems. Reliab Eng Syst Saf 123:57–62CrossRefGoogle Scholar
 Khan MA, Islam H (2012) Bayesian analysis of system availability with halfnormal life time. Qual Technol Quant Manag 9(2):203–209MathSciNetCrossRefGoogle Scholar
 Kuo W (1985) Bayesian availability using gamma distributed priors. IIE Trans 17(2):132–140CrossRefGoogle Scholar
 Lin J (2014) An integrated procedure for bayesian reliability inference using Markov Chain Monte Carlo methods. J Qual Reliab Eng 2014:1–16CrossRefGoogle Scholar
 Marquez AC, Iung B (2007) A structured approach for the assessment of system availability and reliability using Monte Carlo simulatoin. J Qual Maint Eng 13(2):125–136CrossRefGoogle Scholar
 Marquez AC, Heguedas AS, Iung B (2005) Monte Carlobased assessment of system availability. A case study for cogeneration plants. Reliab Eng Syst Saf 88:273–289CrossRefGoogle Scholar
 Ocnasu AB (2007) Distribution system availability assessment—Monte Carlo and antithetic variates method. In: 19th international conference on electricity distribution, ViennaGoogle Scholar
 Raje D, Olaniya R, Wakhare P, Deshpande A (2000) Availability assessment of a twounit standby pumping system. Reliab Eng Syst Saf 68:269–274CrossRefGoogle Scholar
 Sharma K, Bhutani R (1993) Bayesian analysis of system availability. Microelectron Reliab 33(6):809–811CrossRefGoogle Scholar
 Yasseri SF, Bahai H (2018) Availability assessment of subsea distribution systems at the architectural level. Ocean Eng 153:399–411CrossRefGoogle Scholar
 Zio E, Marella M, Podofillini L (2007) A Monte Carlo simulation approach to the availability assessment of multistate system with operational dependencies. Reliab Eng Syst Saf 92:871–882CrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.