Keywords

1 Introduction

Reliability, together with Availability and Maintainability, is a performance indicator and it allows the quantification of how long an item can operate without failure (Nakagawa 2005, Furlanetto et al. 2006). Reliability is defined as the probability that the item will perform a required function under stated conditions for a stated period of time (Macchi et al. 2012a). This parameter must be kept at high level when failure cost is high (e.g. spare parts replacement cost, damages cost, etc.) and when failures have dramatic consequences in terms of safety (e.g. in the case of airplanes, nuclear and chemical plants) (Furlanetto et al. 2006). In this case, maintenance costs have to be minimized while keeping the risks within strict limits and meeting satisfactory requirement.

The system reliability R sys depends on many factors, the main ones are discussed hereafter. R sys depends on the reliabilities of the various failure modes R i . The reliability of a generic failure mode i depends on the parameters used to describe its failure behaviour. If the Weibull distribution describes the failure behaviour, three parameters have to be considered: the typical life α, the shape factor β and the time scale factor γ (Macchi et al. 2012a). The number of interventions that are planned in the reference time horizon influences the system reliability. An elevated number of maintenance interventions allows reaching high system reliability (Furlanetto et al. 2006). The system reliability is affected by the so-called human factor. Sometimes, the operators do not perform the maintenance intervention perfectly and, as a consequence, a partial (or even null) improvement of the reliability follows. According to this view, the concept of imperfect maintenance can be introduced and can be applied to both corrective and preventive maintenance. For further information about the topic, look at (Andrew K.S. Jardine 2005, Doostparast et al. 2014, Lie and Chun 1986, Tsai et al. 2001). Eventually, the system reliability is affected by the arrangement of the maintenance interventions in the planning horizon, i.e. the disposition of the interventions in the time window under consideration. With the same number of interventions in the planning horizon, a proper disposition of the interventions along time can lead to higher system reliability: the disposition that maximizes the reliability can be found. The search of the best disposition of the interventions to maximize the reliability is herein defined as orchestration (Fumagalli et al. 2017). Because of the poor attention of the scientific literature about this topic, the paper wants to contribute on the research about the impact of this novel factor on the system reliability.

The work focuses on a generic system and its failure modes that are assumed to be maintainable, independent and in a series-wise configuration (Fedele et al. 2004). A failure mode is defined maintainable if it can be improved by means of a maintenance action (Zequeira and Be 2006, Castro 2009, Lin et al. 2000). Two failure modes are independent if an intervention on the first failure mode does not affect the other failure mode and vice versa (Zequeira and Be 2006).

In Sect. 3 a method to find the arrangement of the maintenance interventions in the planning horizon that maximizes the system reliability, respecting the constraint regarding the same number of interventions, is presented. The method uses the cross-correlation, a mathematical operator that is described in Sect. 2. In Sect. 4 a numerical application of the cross-correlation method is developed. Eventually, conclusions and future developments are reported in Sect. 5.

2 Cross-Correlation in Maintenance

Auto-correlation and cross-correlation are mathematical operators which find many applications in various fields (Vorburger et al. 2011, Tsai and Lin 2003, Michele et al. 2003). In signal processing analysis finding a quantitative indicator that measure the similarity between two signals x(t) and y(t) is very important. It may happen that two signals are shifted in time: the cross-correlation is able to determine the time shift between them, looking at the time instant in which the two signals are more similar. The formula of the unbiased cross-correlation for two sampled signals x(t) and y(t) can be found looking at (Doebelin 2008).

The system reliability plot is the resulting signal of the combination of the reliability signals of the failure modes of the system. As said, the failure modes are in series-wise configuration: the reliability of the system R sys (t) at a generic time t is given by the product of the reliabilities of the various failure modes at that time instant (Furlanetto et al. 2006, Trapani et al. 2015). In the paper, the Weibull distribution is used, which is described by means of three parameters: the life parameter α i , the shape factor β i and the location parameter γ i . It is assumed that the effect of a PM on the failure mode i is the restoration to the reliability curve R i (t) of that failure mode up to one. On the contrary, a CM on FM i does not influence the reliability curve R i (t). In the paper both the average system reliability R avg and the minimum system reliability R min are taken into account. This is an innovative aspect since, generally, in literature the reliability is considered just as a constraint to be respected (e.g. the reliability of the system must be kept above a specified threshold) (Das 2007, Doostparast et al. 2014). The two indicators (R avg and R min ) are necessary to demonstrate that the orchestration of the maintenance interventions in the planning horizon influences the reliability of the system.

A machine subject to various failure modes is considered. For each failure mode the best preventive maintenance interval T pi can be calculated using one of the optimization models present in literature (the one that minimizes the maintenance costs or the one the maximize the availability are only some examples (Macchi et al. 2012b)). A first situation that may happen is the following: two scheduled interventions very close in time. The reliability of the system is very low before the first maintenance intervention and, since the two scheduled interventions are very close in time, after the second one it is almost one. This is not a good situation and it can be improved in two different ways: trying to opportunely space the two maintenance interventions or trying to aggregate them performing them at the same time. The paper focuses on the first option. Sometimes, the spreading out of the maintenance interventions is also appreciated when simultaneous downtime of components is undesirable (Anon 1997). Further, the spreading out of the maintenance interventions leads to workload distribution of the maintenance resources, that must be coherent with maintenance planning and scheduling constraints, as well as production plan (Macchi et al. 2014a, Macchi et al. 2014b).

The paper addresses the problem about how finding a proper distance between two or more maintenance interventions. The idea is to treat the curves of the reliability of the various failure modes as signals and to compare them using the cross-correlation. In the following section the method is described in detail. The method has been developed for a system subject to only two failure modes. The difficulty of extending the method to more than two failure modes is explained in Sect. 5. The cross-correlation method that is presented in Sect. 3 is a method that allows changing a maintenance plan that already exists; it allows arranging in an intelligent manner the already planned maintenance interventions. It is not a maintenance optimization method in the classical sense (whose output is a maintenance plan), since it is applied after having generated a maintenance plan. It can be seen as a further tool to improve a maintenance plan (in fact, the number of interventions that have to be performed in the planning horizon remains the same).

3 The Orchestration Model

A system affected by two failure modes (FM 1 and FM 2 ), each one described by its Weibull parameters (α 1 , β 1 , and γ 1 for FM 1 and α 2 , β 2 , and γ 2 for FM 2 ), is considered. An assumption is done: MTTF 1  < MTTF 2 , i.e. the first failure mode is more critical than the second one (it is characterized by a lower mean-time-to-failure). Actually this is not a hypothesis, but rather a way to say that the following discussion is valid if we name “failure mode 1” the failure mode with the lowest MTTF, and so the one that requires more urgent action in time. A PM on the FM 1 is performed every T p1 weeks and on the FM 2 every T p2 weeks. So in the specified time horizon T horizon the indication about when and on which failure mode a PM has to be performed is given, i.e. the maintenance plan is already done. The reliability of FM 1 is a periodic signal with period equal to T p1 while the reliability of FM 2 has a period equal to T p2 . The system reliability can be plotted, considering that at every time instant t it is equal to the product of reliabilities of the two failure modes at that time.

The system reliability signal can be very irregular, due to the fact that the maintenance interventions have been optimized only considering the various failure modes separately. In other words, the irregularity is due to the fact that clock-based preventive maintenance is done for both the failure modes but with different preventive maintenance intervals: a maintenance intervention on FM i is performed every T pi , where T pi is the optimum PM interval for failure mode i (according to the maintenance optimization model that has been used). Sometimes only few weeks pass between a PM on FM 1 and on FM 2 and sometimes a wider distance between the two interventions is present. In the paper a method to space in an intelligent manner the maintenance interventions is proposed and the cross-correlation operator is useful to develop it. The objective is to find the time instant T p in which the two periodic signals (reliability of failure mode 1 R 1 (t) and of failure mode 2 R 2 (t)) present the minimum similarity. The idea is to shift one reliability curve of T p time units in order to act on one failure mode in the time instant in which the other is more different. In this way a situation that presents two advantages is reached. The first one is that the maintenance interventions are more regularly disposed in the time horizon and this leads to a workload distribution of maintenance resources. The second is that, with the optimal disposition proposed by the cross-correlation method, the minimum of the system reliability has a higher value than in the non-optimized situation.

The cross-correlation method allows spacing the maintenance interventions in an intelligent way: keeping the same number of interventions in the time horizon, higher system reliability is reached. In order to find the instant T p in which the two signals (reliability of failure mode 1 and 2) present the minimum similarity, the two periodic signals must be cross-correlated. The cross-correlation function gives reliable results if it is calculated using a big number of points. So, in order to reach reliable results from the method, the reliability signals are stretched so that they have got no more T horizon points, but a bigger number of points. Since they are periodic signals, this can be done simply adding more periods to them. After that the two signals have been cross-correlated, from the cross-correlation plot, the time instant T min in which the cross correlation presents a minimum is identified. T min is linked to the instant T p of interest (in principle the two values may not coincide, as it will be explained later on). Calculating the cross-correlation between R 1 (t) and R 2 (t) means to shift to the left R 2 (t) while keeping fixed R 1 (t). The cross-correlation is a periodic signal since the two input signals R 1 (t) and R 2 (t) are periodic. The period T of the cross-correlation is equal to the least common multiple (l.c.m.) between T p1 and T p2 . It is easy to understand, in fact, that, after a time equal to the least common multiple, the whole signal, given by the combination of R 1 (t) and R 2 (t), repeats itself. So it is sufficient to analyse one period of the cross-correlation since after the l.c.m. just repetitions of information are present. Thanks to the periodicity of R xy (t) the calculation of the cross-correlation for negative time shifts can be avoided. It can happen that the l.c.m. and so the period T of the cross-correlation is higher than the planning horizon T horizon of the maintenance problem. In this situation the cross-correlation plot is analysed only from time t = 0 to t = T horizon and in this time frame the minimum of the cross-correlation is found. Once that the time instant in correspondence of which there is the minimum cross-correlation has been identified, the necessity of finding T p , the optimum time-shift of interest, arises. Two possible cases can occur:

  • T min  < T p2 . This is the situation in which the time instant in correspondence of the minimum of the cross correlation is lower that the period of the moved signal (R 2 (t)). If this condition is verified T p  = T min , i.e. the optimal distance between the interventions is directly the time instant T min .

  • T min  > T p2 . It means that the minimum similarity between R 1 (t) and R 2 (t) is reached moving on the left R 2 (t) of T min . Since T min  > T p2 , moving on the left R 2 (t) of T min is equivalent to move R 2 (t) of n time units; n is the difference between T min and T p2 .

After having identified T p , the system reliability curve can be plotted. It is the product between the R 1 (t), that has remained fixed, and R 2new (t), which is the R 2 (t) signal shifted on the left of T p time units. So a new maintenance plan is generated. It contains the same number of interventions of the original plan but has two advantages. The first one is that the spacing between the interventions is more homogeneous; the second one is that the minimum system reliability has increased.

4 Numerical Application of the Model

Let assume a system subject to two failure modes with the following characteristics:

  • FM 1 : α 1  = 20, β 1  = 2.5, γ 1  = 0 that corresponds to a MTTF 1  = 17.75 weeks.

  • FM 2 : α 2  = 35, β 2  = 3, γ 2  = 0 that corresponds to a MTTF 2  = 31.25 weeks.

The time horizon T horizon is the year (52 weeks). It is supposed to perform a maintenance intervention on FM 1 every T p1  = 12 weeks and on FM 2 every T p2  = 13 weeks. The maintenance plan for the as-is situation (original maintenance plan) is represented in Fig. 2. From the analysis of the system reliability plot, the following indicators (avarage system reliability, minimum system reliability, number of interventions on the first and second failure modes) can be calculated: R avarage  = 93,08%, R minimum  = 72,68%, N FM1  = 4 and N FM2  = 4. Looking at the maintenance plan, it is evident that the interventions are not well spread out in the time horizon. Clearly, the situation has to be improved, trying to better space the maintenance interventions.

The cross-correlation method between R 1 (t) and R 2 (t) is applied. The cross-correlation signal is a periodic signal since the two input signals R 1 (t) and R 2 (t) are periodic. The period T of the cross-correlation is equal to the least common multiple (l.c.m.) between T p1 and T p2 , which in the example is equal to 156. So every T the system is completely restored to an AGAN condition since at the T th week two perfect maintenance interventions have to be performed, the first on FM 1 and the second on FM 2.

The minimum of the cross-correlation is at T min  = 22 (Fig. 1); so the two signals present the minimum similarity at the 22th week. Shifting on the left R 2 (t), which is a periodic signal with period T p2 , of T min time units is equal to shift it on left of T p  = T min -T p2  = 22-13 = 9 time units. So a new reliability curve R 2new (t) can be built and it will be used to calculate the new system reliability plot. In particular, the first maintenance intervention will be performed on the FM 2 , the one that has been shifted, and it will be at the week number T p2 -T p  = 13-9 = 4 week.

Fig. 1.
figure 1

Cross-correlation plot.

Fig. 2.
figure 2

Original maintenance plan.

After having applied the cross-correlation method, the new maintenance plan (Fig. 3) and the new system reliability signal are analysed. From the analysis of the system reliability plot, the following indicators can be calculated: R avarage  = 93,24%, R minimum  = 74,77%, N FM1  = 4 and N FM2  = 4. Looking at the indicators, it is observed that the cross-correlation method proposes a better disposition of the maintenance intervention (as it can be seen in Fig. 4), leading to a higher minimum system reliability while keeping the same number of interventions. In particular, the cross-correlation method allows finding the best disposition of the interventions, i.e. the one that gives the higher minimum system reliability.

Fig. 3.
figure 3

New maintenance plan.

5 Conclusions

There are some limitations to the cross-correlation method proposed above. The following comments are derived based on some numerical tests that have been carried out, but are not presented in the paper.

Firstly, the cross-correlation method implies to anticipate a maintenance intervention of T p time units in order to reach the situation in which the reliability curves of the two failure modes are less similar (less superimposed) and to reach the consequent advantages. It can happen that the anticipation of a maintenance intervention leads to the situation in which a further intervention on the shifted failure mode should be performed in the planning horizon. In this case, the hypothesis of keeping the same number of interventions of the original maintenance plan is violated.

Secondly, in another situation, there can be a maintenance plan in which the maintenance interventions are already disposed regularly and so there is no need to apply the cross-correlation method, or better if the method is used, it does not lead to satisfactory results. This happens when the two failure modes are very different in terms of MTTF and so a failure mode must be maintained very more frequently than the other. The last consideration implies that T p1  << T p2 .

A third situation happens when the time horizon is long compared to the T pi . In this case, the beneficial effect is present only at the beginning of the time horizon, while at the end it fades away. The cross-correlation method allows spacing as much as possible a sequence of numbers, but sooner or later the numbers in the new spaced sequence will be near again. The method “shifts” in time the problem of having a non-regular disposition of the interventions, but if the temporal period in which the problem has been shifted is part of the planning horizon, then the method does not provide satisfactory results; it simply creates a change in the trend: a good situation at the beginning and a bad situation at the end.

Eventually, the cross-correlation method proposed in Sect. 3 considers a system subject to only two failure modes. The discussion has to be extended to more than two failure modes, since a machine/mechanical system is expected to fail with more than two failure modes. The main difficulty is that the cross-correlation function is a mathematical operator that takes into consideration only two signals at a time. In signal processing analysis there is not an extension of the cross-correlation to more than two signals because it is sufficient to cross-correlate pairs of signals. However, the interest for this paper is not to align the reliability curves of the various failure modes, but rather to find of how many time units the curves have to be shifted in order that they present the minimum similarity. A possible future development is finding a way to properly extend the cross-correlation method to a mechanical system subject to a generic number n of failure modes.