# Reducing Dimensionality of Multi-regime Data for Failure Prognostics

## Abstract

Over the last decade, the prognostics and health management literature has introduced many conceptual frameworks for remaining useful life predictions. However, estimating the future behavior of critical machinery systems is a challenging task due to the uncertainties and complexity involved in the multi-dimensional condition monitoring data. Even though many studies have reported promising methods in data processing and dimensionality reduction, the prognostics applications require integration of these methods with remaining useful life estimations. This paper describes a multiple linear regression process that reduces the number of data regimes under consideration by obtaining a set of principal degradation variables. The process also extracts health indicators and useful features. Finally, a state-space model based on frequency-domain data is used to estimate remaining useful life. The presented approach is assessed with a case study on turbofan engine degradation simulation dataset, and the prediction performance is validated by error-based prognostic metrics.

## Keywords

Failure prognostics Multi-dimensional data Dimensionality reduction Remaining useful life estimation## Introduction

Turbofan engine degradation simulation dataset characteristics

Dataset: FD001 | Dataset: FD003 |

Train trajectories: 100 | Train trajectories: 100 |

Test trajectories: 100 | Test trajectories: 100 |

Conditions: ONE (sea level) | Conditions: ONE (sea level) |

Fault modes: ONE (HPC degradation) | Fault modes: TWO (HPC and fan degradation) |

Dataset: FD002 | Dataset: FD004 |

Train trajectories: 260 | Train trajectories: 248 |

Test trajectories: 259 | Test trajectories: 249 |

Conditions: SIX | Conditions: SIX |

Fault modes: ONE (HPC degradation) | Fault modes: TWO (HPC and fan degradation) |

Prognostics can make contributions into these changing expectations by providing dynamic maintenance planning strategies for critical engineering systems. They can provide improved reliability and reduced costs for operation and maintenance of complex systems. As a steadily growing subject, prognostics have advanced expertise in various disciplines [2]. Many breakthroughs in remaining useful life estimation can be found in complex engineering systems such as electronics [3, 4], batteries [5, 6], actuators [7], turbofan engines [8, 9] and NASA’s launch vehicles and spacecraft systems [10].

In general, a typical prognostic method modeled for the complex systems depends on measured condition monitoring data and provides simplified representations of complex datasets. Considering that the operations are generally performed in multiple regimes, data processing becomes a major issue confronting the prognostic users. Since it is very unlikely to evaluate the operational and environmental conditions, a systematic framework for data processing is required to account for the uncertainties in prognostics [11]. Such a data processing can analyze the uncertainties in condition monitoring data for a better understanding of the system’s damage propagation in upcoming operational and environmental conditions.

The main objective of this paper is to develop a conceptual prognostic framework to overcome the issues presented by noisy and multi-dimensional data. Multiple linear regression is used to model the relationship between different explanatory regime variables and a monotonic response variable by fitting an equation to monitoring data. This process returns the coefficient estimates for a multiple linear regression of the responses which can be further used to calculate the response variables of all different operational trajectories. A state-space model is then proposed to use these response variables for the multi-step ahead remaining useful life (RUL) predictions.

## Motivation and Problem Statement

For a degradation process that is predicated on system aging and monotonic damage accumulation and manifests itself in the physical composition of the system, it should be possible to correlate sensor behavior with signs of aging to estimate the remaining useful life of systems [12]. However, the multi-dimensional data caused by multiple operational regimes could not provide useful information to measure the monotonic damage accumulation. Further applications are needed to provide useful information for remaining useful life predictions.

*i*th unit in a dataset and

The preprocessing of such raw data is an essential step to any study relying on any type of data-driven techniques. To obtain a meaningful wear level index for prognosis, a data processing approach is applied for feature extraction, data cleaning and feature selection. The characteristics of raw data and system conditions are first extracted. Then, any useless and misleading outliers caused by noise during operations are removed. This practice first deals with the issues relating to organizing the multi-dimensional data to reduce data redundancy and improve regime integrity. The noisy sensors operating under different regimes are standardized to each other, and so the common behavior of sensors can be observed and investigated. Next, a wear level index can provide comparable and actionable information about the common population health, as well as track degradation progress and performance over time.

## Signal Processing and Dimensionality Reduction

The “turbofan engine degradation simulation dataset” used in this paper was provided by the Prognostics CoE at NASA Ames and made publicly available [13]. Engine degradation simulation was carried out using C-MAPSS software, and four different scenarios were simulated under different combinations of operational conditions, regimes and fault modes (see Table 1). Several sensor channels in the datasets characterize the fault evolution. It is expected from users to develop their algorithms using training sets and make the remaining useful life estimations by using test sets provided in the package.

All four datasets are formed of multi-various time series, which are assembled into training and test subsets. The start of each variable is set in normal operational conditions with an unknown case-specific initial wear level which is considered normal [13].

Training time series operates in full operational periods which terminates at a failure point due to the wear. On the other hand, the test subsets are ended at a certain point before the engine reaches the system failure. The challenge is to predict the remaining useful life between the end of each test set and to validate the results with the actual failure point which was given separately by a vector corresponding to true RUL values of the test data [14].

Each measurement in both data subsets is a snapshot data which are taken during a single operational cycle. Although the measurements are not named, it is known to users that they correspond to different variables [13].

Datasets with single and multiple operational regimes are used in this paper. It is observed that some sensors behave differently in different datasets. The raw measurements are highly noisy and scattered values with different value ranges in each single series.

### Multiple Linear Regression

The raw values of selected time series, which are inconsistent with each other, need a feature extraction transformation of the multi-regime data in the high-dimensional space to a space of a single wear level dimension. This transformation can reduce the dimensionality of the time series from their original scales to a notionally common scale that will include meaningful information for prognosis.

*Y*is a \(n \times 1\) vector of values of the target variable, X is an \(n \times p\) matrix of observed responses and \(\beta\) is a \(n \times 1\) vector of coefficient estimates for a multiple linear regression of the responses.

### Synthetic Wear Level Index Estimation

To assign the target variable, a mathematical model for the synthetic data has been established. This makes it possible to model a useful prognostic output for raw measurement data. Since the exact behavior of degradation change is known, the coefficient variables can be calculated with regard to the operational setting, and the differences caused by the noise.

*t*is the time unit and

*l*is the length of time series representing the full sets of operations. This function forces exponentially to increase wear levels. In Fig. 1, the wear levels with different operational length measures are shown.

The datasets FD002 and FD004 consist of a set of operational regimes, but the degradation trends can be clearly seen after the readings at each regime are selected and monitored separately. In order to increase the performance of the multiple linear regression model, the readings at each regime order can be clustered and the dimensionality reduction is applied into these clustered readings [9].

## Failure Prognosis

Training trajectories demonstrate full operational life time of engines, and failure occurs at a certain point which is accepted as the threshold level for wear growth. Test subsets, on the other hand, end some time prior to failure occurrence. This means that there is an unknown time to failure and also that there are no real data to train remaining step.

In the lack of future data steps, the state-space modeling predicts the future behavior of test subsets with a direct connection from reference training trajectories. It is necessary to train the model with a training subset and then convert to the estimation mode to make multi-step ahead remaining useful life estimations by including only the external test trajectories.

### State-Space Estimation Model

The proposed multi-step ahead prediction algorithm estimates a continuous-time state-space model of order *nx* using the frequency-domain data, the recurrence relation of wear level index. The function generates a state-space model object with identifiable parameters [17].

*u*, output

*y*and error term (disturbance)

*e*is represented by the following equation in continuous time.

*A*,

*B*,

*C*,

*D*and

*K*are state-space matrices, and \(x_t\) is the vector of

*nx*states.

After the dimensionality of data is reduced to a single wear level index for each trajectory, it is expected that the wear growth model can be applied to learn the pattern from historical data and to estimate remaining useful life time until the pattern exceeds threshold point. Although the state-space model can accomplish the training for the cleaned vectors, it cannot produce predictions of multi-step long-term time series when exponential growth is present as in the case in Fig. 2. The exponential series is transformed so that the model can perform well. Then, each further series is defined as a function of the preceding values [18].

The proposed model matches the wear level index between the test trajectories and the corresponding part of the training trajectories, but the model is interested in the recurrence relation which is a state in the model. After the data are recalculated as shown in Fig. 3, the wear index patterns take a stationary form rather than being non-stationary. The proposed model has an arbitrary state that can be transformed so that the stationary state has meaning, in this case the recurrence relation of the exponential wear index.

## Results and Discussion

The C-MAPSS turbofan dataset provides a separate vector of true remaining useful life values for the test data series. According to their true RUL values, the performance evaluation metrics based on the estimation performance can be applied. The measurements have signified their practical relevance in prognostic designs and have found their way into multi-step predictions. The metrics used in this research are based on the works of [14, 19].

**Mean Absolute Error**

**Mean Absolute Percentage Error**

**Mean Square Error**

**False Positive Rate (FP) and False Negative Rate (FN)**

Prognostic performance metrics

Dataset | MAE | MAPE | MSE | FP (%) | FN (%) |
---|---|---|---|---|---|

FD001 | 13.6 | 18.2 | 332.6 | 60 | 40 |

FD002 | 16.2 | 26.1 | 555.5 | 52 | 48 |

FD003 | 15.7 | 20.9 | 498.5 | 63 | 37 |

FD004 | 18.3 | 24.3 | 630.3 | 56 | 44 |

In Table 2, the prognostic metric results are shown. The multi-step forecast performance over the long-term cycles is calculated in a close-range to true remaining useful life. The performance evaluation prognostic metrics have been prepared to determine whether or not the designed algorithm or multi-step prediction results can show their practical results. The developed model seems to exhibit promising results at multi-step long-term time series predictions for exponential wear growths. The training of network could accomplish learning as desired, while training performance is substantially increased by multiple predictor function use and the recurrence relation calculation.

## Conclusion

In this paper, a multiple linear regression-based dimensionality reduction model is proposed for multi-step ahead remaining useful life estimation. The prediction method builds on a state-space model using frequency-domain data.

The performance of the proposed prognostic method is evaluated by four different subsets of turbofan engine degradation simulation dataset which were simulated under different combinations of operational conditions and fault modes. The results have shown that the combination and filtering of models can yield a low error rate in the remaining useful life prediction.

Analysis of the multi-step ahead estimation suggests that the model can determine the remaining useful life of an average operating system, and can adjust the estimation over time-based usage data. It is also observed that the dimensionality reduction model can detect the initial wear levels of different trajectories.

## References

- 1.A.K.S. Jardine, D. Lin, D. Banjevic, A review on machinery diagnostics and prognostics implementing condition-based maintenance. Mech. Syst. Signal Process.
**20**, 1483–1510 (2006)CrossRefGoogle Scholar - 2.J. Lee, F. Wu, W. Zhao, M. Ghaffari, L. Liao, D. Siegel, Prognostics and health management design for rotary machinery systems—reviews, methodology and applications. Mech. Syst. Signal Process.
**42**, 314–334 (2014)CrossRefGoogle Scholar - 3.M. Pecht,
*Prognostics and Health Management of Electronics*(Wiley, Hoboken, 2008)CrossRefGoogle Scholar - 4.N.M. Vichare, M.G. Pecht, Prognostics and health management of electronics. IEEE Trans. Compon. Packag. Technol.
**29**, 222–229 (2006)CrossRefGoogle Scholar - 5.B. Saha, K. Goebel, S. Poll, J. Christophersen, Prognostics methods for battery health monitoring using a Bayesian framework. IEEE Trans. Instrum. Meas.
**58**, 291–296 (2009)CrossRefGoogle Scholar - 6.K. Goebel, B. Saha, A. Saxena, J.R. Celaya, J.P. Christophersen, Prognostics in battery health management. IEEE Instrum. Meas. Mag.
**11**(4), 33–40 (2008)CrossRefGoogle Scholar - 7.C.S. Byington, M. Watson, D. Edwards, P. Stoelting, A model-based approach to prognostics and health management for flight control actuators, in Proceedings of Aerospace Conference, vol. 6, pp. 3551–3562 (2004)Google Scholar
- 8.T. Wang, J. Yu, D. Siegel, J. Lee, A similarity-based prognostics approach for remaining useful life estimation of engineered systems, in Proceedings of International Conference on Prognostics and Health Management, PHM, vol. 2008, pp. 1–6 (2008)Google Scholar
- 9.E. Ramasso, Investigating computational geometry for failure prognostics. Int. J. Progn. Health Manag.
**5**, 005 (2014)Google Scholar - 10.V.V. Osipov, D.G. Luchinsky, V.N. Smelyanskiy, C. Kiris, D.A. Timucin, S.H. Lee, In-flight failure decision and prognostics for the solid rocket booster, in Proceedings of AIAA 43rd AIAA/ASME/SAE/ASEE Joint Propulsion Conference and Exhibit, Cincinnati, OH, (2007)Google Scholar
- 11.S. Sankararaman, K. Goebel, Uncertainty in prognostics and systems health management. Int. J. Progn. Health Manag.
**6**, 010 (2015)Google Scholar - 12.S. Uckun, K. Goebel, P.J.F. Lucas, Standardizing research methods for prognostics, in Proceedings of International Conference on Prognostics and Health Management, PHM 2008, (2008)Google Scholar
- 13.A. Saxena, K. Goebel, “Turbofan engine degradation simulation data set”, NASA Ames prognostics data repository (NASA Ames Research Center, Moffett Field, CA, 2008), https://ti.arc.nasa.gov/tech/dash/pcoe/prognostic-datarepository/
- 14.A. Saxena, K. Goebel, D. Simon, N. Eklund, Damage propagation modeling for aircraft engine run-to-failure simulation, in Proceedings of International Conference on Prognostics and Health Management, PHM 2008, (2008)Google Scholar
- 15.S. Chatterjee, S.H. Ali, Influential observations, high leverage points, and outliers in linear regression. Stat. Sci.
**1**, 1–6 (2008)Google Scholar - 16.D.A. Freedman,
*Statistical Models: Theory and Practice*(Cambridge University Press, New York, 2009), pp. 41–60CrossRefGoogle Scholar - 17.L. Ljung,
*System Identification: Theory for the User*, PTR Prentice Hall Information and System Sciences Series (Prentice Hall, New Jersey, 1999), pp. 81–90Google Scholar - 18.O. Bektas, J.A. Jones, NARX time series model for remaining useful life estimation of gas turbine engines, in Proceedings of Third European Conference of the Prognostics and Health Management Society, (2016)Google Scholar
- 19.K. Goebel, A. Saxena, S. Saha, B. Saha, J. Celaya,
*Machine Learning and Knowledge Discovery for Engineering Systems Health Management—Prognostic Performance Metrics*(CRC Press, New York, 2011), p. 147CrossRefGoogle Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.