A comprehensive review of artificial intelligencebased approaches for rolling element bearing PHM: shallow and deep learning
 274 Downloads
Abstract
The objective of this paper is to present a comprehensive review of the contemporary techniques for fault detection, diagnosis, and prognosis of rolling element bearings (REBs). Datadriven approaches, as opposed to modelbased approaches, are gaining in popularity due to the availability of lowcost sensors and big data. This paper first reviews the fundamentals of prognostics and health management (PHM) techniques for REBs. A brief description of the different bearingfailure modes is given, then, the paper presents a comprehensive representation of the different health features (indexes, criteria) used for REB fault diagnostics and prognostics. Thus, the paper provides an overall platform for researchers, system engineers, and experts to select and adopt the best fit for their applications. Second, the paper provides overviews of contemporary REB PHM techniques with a specific focus on modern artificial intelligence (AI) techniques (i.e., shallow learning algorithms). Finally, deeplearning approaches for fault detection, diagnosis, and prognosis for REB are comprehensively reviewed.
Keywords
Deep learning Diagnosis Fault detection Rolling element bearing Shallow learning Prognostics and health management1 Introduction
PHM, which aims to detect machine breakdown and prevent consequent accidents that bring economic losses, is a wide research domain. This paper focuses on reviewing and summarizing contemporary PHM techniques applied to rotating electrical machines (REMs). REMs are at the heart of most engineering processes (due to their relatively low price and operational ease [11], [12]) and REM failures are one of the foremost causes of breakdown in industry, causing high costs of operating maintenance. Furthermore, rolling element bearing (REB) faults account for 45–55% of REM failures [13, 14] and for about 41% of motor faults, followed by stator faults (37%) and rotor faults (10%) [15].
Taking this into consideration, this paper will present a brief description of the different bearing failure modes, and a comprehensive description of the different health features (indexes, criteria) used for REB fault diagnostics and prognostics, with the goal of providing an overall platform for researchers, system engineers, and experts to select and adopt the best fit for their applications. This paper is organized as follows: Sect. 2 briefly introduces the different bearing failure modes and their causes, followed by a comprehensive representation of the different health features (indexes and criteria). The different existing shallowlearning algorithms for REB PHM are detailed in Sect. 3. Section 4 provides the most recent investigations and studies that are based on the hottest subfield, deep learningbased REB fault detection, diagnosis, and prognosis. Finally, a summary and concluding remarks are given in Sect. 5.
A prior survey paper [21] gives a review of the emerging research work related to deep learning and new trends related to its use in machine health monitoring for different applications and systems. In addition, the review paper of Zurita et al. [22] mainly reviewed the stateoftheart vibration conditionbased monitoring of gears and bearings that are based on advanced digital signal processing techniques and artificial intelligence methods. In contrast to these prior works, this paper focuses only on reviewing contemporary learning algorithms (i.e., the shallow learning algorithms and the deep learning algorithm and its variants) for REB fault detection, diagnosis, and prognosis techniques. Contemporary PHM techniques are summarized as follows.
Modern engineering systems are embracing more and more userfriendly data acquisition tools and lowcost sensors that are connected to the internet. Therefore, PHM researchers and practitioners are adopting contemporary techniques, i.e., smart datadriven approaches—SLbased PHM and DLbased PHM techniques—that have been developed in the last decade. These techniques aim to synthesize information available from the acquired data to better represent the system’s health condition. Further, the latter (i.e., DLbased PHM) extracts the bestsuited features from big data and better represents the system health condition in a hierarchical architecture. With the propagation of acquired data, DLbased PHM techniques model the highlevel representation of the complex multivariate nonlinear relationship behind the data without need for a profound understanding of the system physics; this eliminates the need for a significant amount of human labor. In contrast, SLbased PHM methods require a manual feature extraction step, which may require domain knowledge. Thus, these methods can face problems in extracting useful representations from big data.
2 Fundamentals of rolling element bearing (REB) prognostics and health management (PHM)
In industry, the health of many machines depends on the robustness and reliability of the REBs. Failures may appear in REBs during operation or before (i.e., during the manufacturing process). From a prior FMECA (failure modes, effects, and criticality analysis) study of servo motors, which are the core component for mechanism control of electrical machinery, bearing faults were shown to have the highest frequency, severity, and criticality [24]. Therefore, detection, diagnosis, and prognosis of these defects are important for prognostics and health management, as well as for quality inspection of bearings [25].
2.1 Bearing failure modes

Fatigue begins as a tiny crack on the bearing surface (rollers or races) due to a material structure change, which is caused by repeated stress in the contact areas.

Wear comes from the presence of dirt or foreign particles inside the bearing due to inaccurate sealing or inadequate lubrication (contamination).

Electric erosion is damage (in the form of craters) in one of the bearing parts (rollers or races) due to a passing through the bearing of an electric current.

Corrosion comes from the presence of water or corrosive agents inside the bearing due to damaged seals, acidic lubricants, or a sudden high change of operating temperature.

Plastic deformation generates mainly when the bearing is subject to an excessive load that results in an indentation of the raceways.

Fracture and cracking results from the stress that comes from rough treatment (impacts) or from cyclic stress. Additionally, fracture and cracking can be caused by high heating (thermal).
2.2 REB health features
Various features used in REB PHM techniques
No.  Features  Definition  Physical meaning 

Time domain features  
1  Maximum [29]  \(I_{\hbox{max} } = \mathop {\hbox{max} }\limits_{k = 1 \ldots N} (x(k))\)  Kinetic energy related 
2  Minimum [29]  \(I_{\hbox{min} } = \mathop {\hbox{min} }\limits_{k = 1 \ldots N} (x(k))\)  Kinetic energy related 
3  Absolute maximum [30]  \(I_{\text{amax}} = \mathop {\hbox{max} }\limits_{k = 1 \ldots N} (\left {x(k)} \right)\)  Kinetic energy related 
4  Sum [36]  \(I_{\text{sum}} = \sum\nolimits_{k = 1}^{N} {x(k)}\)  Kinetic energy related 
5  Median [36]  \(I_{\text{med}} = \mathop {\text{median}}\limits_{k = 1 \ldots N} (x(k))\)  Kinetic energy related 
6  Most frequent value [36]  \(I_{\bmod } = \mathop {\text{mode}}\limits_{k = 1 \ldots N} (x(k))\)  Kinetic energy related 
7  Mean [28]  \(I_{\text{mean}} = \bar{x} = {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 N}}\right.\kern0pt} \!\lower0.7ex\hbox{$N$}}\sum\nolimits_{k = 1}^{N} {x(k)}\)  Kinetic energy related 
8  Absolute mean [30]  \(I_{\text{amean}} = \left {\bar{x}} \right = {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 N}}\right.\kern0pt} \!\lower0.7ex\hbox{$N$}}\sum\nolimits_{k = 1}^{N} {\left {x(k)} \right}\)  Kinetic energy related 
9  Mean absolute deviation [36]  \(I_{\text{mad}} = \mathop {\text{mad}}\limits_{k = 1 \ldots N} (x(k))\)  Kinetic energy related 
10  Harmonic mean [36]  \(I_{\text{har}} = {N \mathord{\left/ {\vphantom {N {\sum\nolimits_{k = 1}^{N} {\frac{1}{x(k)}} }}} \right. \kern0pt} {\sum\nolimits_{k = 1}^{N} {\frac{1}{x(k)}} }}\)  Gives the truest average energy 
11  Trapezoidal numerical integration [36]  \(I_{\text{trap}} = \mathop {\text{trapz}}\limits_{k = 1 \ldots N} (x(k))\)  None 
12  Percentiles [36]  \(I_{\text{prc}} = \mathop {\text{prctile}}\limits_{k = 1 \ldots N} (x(k))\)  None 
13  Interquartile rang (IQR) [36]  \(I_{\text{IQR}} = \mathop {\text{iqr}}\limits_{k = 1 \ldots N} (x(k))\)  None 
14  Energy quantification related [29]  \(I_{{\sigma^{2} }} = {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 N}}\right.\kern0pt} \!\lower0.7ex\hbox{$N$}}\sum\nolimits_{k = 1}^{N} {\left( {x(k)  I_{\text{mean}} } \right)}^{2}\)  Energy quantify 
Cation related  
15  \(I_{\text{rms}} = \sqrt {{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 N}}\right.\kern0pt} \!\lower0.7ex\hbox{$N$}}\sum\nolimits_{k = 1}^{N} {\left( {x(k)} \right)}^{2} }\)  Kinetic energy related  
16  RMS error (RMSe) [32]  \(I_{\text{rmse}} = \sqrt {{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 N}}\right.\kern0pt} \!\lower0.7ex\hbox{$N$}}\sum\nolimits_{k = 1}^{N} {\left( {x(k)  I_{\text{mean}} } \right)}^{2} }\)  Kinetic energy related 
17  Delta RMS [36]  \(I_{\text{drms}} = I_{\text{rms}}^{j}  I_{\text{rms}}^{j  1}\) where j is the current segment of time record and j1 in the previous segment  Kinetic energy related 
18  Energy quantification related [29]  \(I_{\sigma } = \sqrt {I_{{\sigma^{2} }} }\)  Energy quantify 
Cation related  
19  Peak value [28]  \(I_{{p_{v} }} = {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern0pt} \!\lower0.7ex\hbox{$2$}}[I_{\hbox{max} }  I_{\hbox{min} } ]\)  Kinetic energy related 
20  Peak to peak [28]  \(I_{\text{p to p}} = [I_{\hbox{max} }  I_{\hbox{min} } ]\)  Kinetic energy related 
21  Peat to RMS [29]  \(I_{\text{p to rms}} = {\raise0.7ex\hbox{${\left {I_{ \hbox{max} } } \right}$} \!\mathord{\left/ {\vphantom {{\left {I_{ \hbox{max} } } \right} {I_{\text{rms}} }}}\right.\kern0pt} \!\lower0.7ex\hbox{${I_{\text{rms}} }$}}\)  Kinetic energy related 
22  Skewness [28]  \(I_{\text{sk}} = \frac{{{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 N}}\right.\kern0pt} \!\lower0.7ex\hbox{$N$}}\sum\nolimits_{k = 1}^{N} {(x(k)  I_{\text{mean}} )^{3} } }}{{(I_{\sigma } )^{3} }}\)  Data statistic related 
23  Kurtosis [40]  \(I_{\text{kur}} = \frac{{{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 N}}\right.\kern0pt} \!\lower0.7ex\hbox{$N$}}\sum\nolimits_{k = 1}^{N} {(x(k)  I_{\text{mean}} )^{4} } }}{{(I_{\sigma } )^{4} }}\)  Data statistic related 
24  Crest factor [28]  \(I_{\text{cf}} = {\raise0.7ex\hbox{${I_{{{\text{p}}_{\text{v}} }} }$} \!\mathord{\left/ {\vphantom {{I_{{{\text{p}}_{\text{v}} }} } {I_{\text{rms}} }}}\right.\kern0pt} \!\lower0.7ex\hbox{${I_{\text{rms}} }$}}\)  Sinusoidal wave shape related 
25  Clearance factor [28]  \(I_{\text{clf}} = {\raise0.7ex\hbox{${I_{{{\text{p}}_{\text{v}} }} }$} \!\mathord{\left/ {\vphantom {{I_{{{\text{p}}_{\text{v}} }} } {\left( {I_{\text{mean}} } \right)^{2} }}}\right.\kern0pt} \!\lower0.7ex\hbox{${\left( {I_{\text{mean}} } \right)^{2} }$}}\)  None 
26  Impulse factor [28]  \(I_{\text{if}} = {\raise0.7ex\hbox{${I_{{{\text{p}}_{\text{v}} }} }$} \!\mathord{\left/ {\vphantom {{I_{{{\text{p}}_{\text{v}} }} } {I_{\text{amean}} }}}\right.\kern0pt} \!\lower0.7ex\hbox{${I_{\text{amean}} }$}}\)  Sinusoidal wave shape related 
27  Shape factor [28]  \(I_{\text{sf}} = {\raise0.7ex\hbox{${I_{\text{rms}} }$} \!\mathord{\left/ {\vphantom {{I_{\text{rms}} } {I_{\text{amean}} }}}\right.\kern0pt} \!\lower0.7ex\hbox{${I_{\text{amean}} }$}}\)  Sinusoidal wave shape related 
28  Margin factor [34]  \(I_{\text{mf}} = {\raise0.7ex\hbox{${I_{\text{amax}} }$} \!\mathord{\left/ {\vphantom {{I_{\text{amax}} } {I_{{\sigma^{2} }} }}}\right.\kern0pt} \!\lower0.7ex\hbox{${I_{{\sigma^{2} }} }$}}\)  None 
29  Coefficient of variance [30]  \(I_{\text{cv}} = {\raise0.7ex\hbox{${I_{\text{mean}} }$} \!\mathord{\left/ {\vphantom {{I_{\text{mean}} } {I_{\sigma } }}}\right.\kern0pt} \!\lower0.7ex\hbox{${I_{\sigma } }$}}\)  None 
30  Coefficient of skewness [30]  \(I_{\text{csk}} = \frac{{{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 N}}\right.\kern0pt} \!\lower0.7ex\hbox{$N$}}\sum\nolimits_{k = 1}^{N} {(x(k))^{3} } }}{{(I_{\sigma } )^{3} }}\)  None 
31  Coefficient of kurtosis [30]  \(I_{\text{ckur}} = \frac{{{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 N}}\right.\kern0pt} \!\lower0.7ex\hbox{$N$}}\sum\nolimits_{k = 1}^{N} {(x(k))^{4} } }}{{(I_{\sigma } )^{4} }}\)  None 
32  TALAF [36]  \(I_{\text{TALAF}} = \log \left( {I_{\text{kur}} + \frac{{I_{\text{rms}} }}{{I_{{{\text{rms}}_{\text{h}} }} }}} \right)\)  None 
33  THIKAT [36]  \(I_{\text{THIKAT}} = \log \left( {\left( {I_{\text{kur}} } \right)^{{I_{\text{cf}} }} + \left( {\frac{{I_{\text{rms}} }}{{I_{{{\text{rms}}_{\text{h}} }} }}} \right)^{{I_{{{\text{P}}_{\text{v}} }} }} } \right)\)  None 
34  Normalized sixth central moment [36]  \(I_{\text{kur6}} = \frac{{{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 N}}\right.\kern0pt} \!\lower0.7ex\hbox{$N$}}\sum\nolimits_{k = 1}^{N} {(x(k)  I_{\text{mean}} )^{6} } }}{{(I_{\sigma } )^{6} }}\)  None 
35  Add factor 1 [30]  \(I_{1} = \frac{{I_{\text{amax}} }}{{I_{\text{sd}} \cdot I_{{\sigma^{2} }} }}\)  None 
36  Add factor 2 [30]  \(I_{2} = \frac{{I_{\text{kur}} \cdot I_{\text{cf}} }}{{I_{\text{sd}} }}\)  None 
37  Fisher criterion [31]  \(I_{\text{fisherc}} = \frac{{\left( {I_{mean}  I_{{mean_{h} }} } \right)^{2} }}{{I_{\text{sd}}^{2} + I_{{{\text{sd}}_{h} }}^{2} }}\)  None 
38  Square root of amplitude [34]  \(I_{\text{sra}} = \left( {{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 N}}\right.\kern0pt} \!\lower0.7ex\hbox{$N$}}\sum\nolimits_{k = 1}^{N} {\sqrt {\left {x(k)} \right} } } \right)^{2}\)  None 
39  Euclidian distance [35]  \(I_{\text{ed}} = \sqrt {\sum\nolimits_{k = 1}^{N} {\left( {I_{h} (k)  I_{f} (k)} \right)}^{2} }\)  None 
40  Sum square error distance [37]  \(I_{\text{sse}} = \left\ {I_{h}  I_{f} } \right\^{2}\)  None 
41  Mahalanobis distance [33]  \(\begin{aligned} I_{\text{mahd}} = \sqrt {\left( {I_{h}  I_{f} } \right)C^{  1} \left( {I_{h}  I_{f} } \right)} \hfill \\ I_{\text{mahd}} = \sqrt {\left( {I_{f}  I_{{f_{\text{mean}} }} } \right)C^{  1} \left( {I_{f}  I_{{f_{\text{mean}} }} } \right)} \hfill \\ \end{aligned}\)  None 
42  Manhattan distance [39]  \(I_{\text{manhd}} = \sum\nolimits_{k = 1}^{N} {\left {I_{h} (k)  I_{f} (k)} \right}\)  None 
43  Median error distance [39]  \(I_{\text{meded}} = \arg \hbox{min} \sum\limits_{k = 1}^{N} {\left\ {I_{h} (k)  I_{f} (k)} \right\_{2} }\)  None 
Frequency domain features  
44  Shaft rotational frequency [37]  \(I_{\text{srf}} = f_{r} = {\raise0.7ex\hbox{${N_{rpm} }$} \!\mathord{\left/ {\vphantom {{N_{rpm} } {60}}}\right.\kern0pt} \!\lower0.7ex\hbox{${60}$}}\)  Position change of main frequency 
45  Outerrace fault (ORF) frequency [39]  \(I_{\text{orf}} = \frac{{N_{b} }}{2}f_{r} \left( {1  \left( {{{d_{b} \cos \beta } \mathord{\left/ {\vphantom {{d_{b} \cos \beta } {d_{p} }}} \right. \kern0pt} {d_{p} }}} \right)} \right)\)  Occurrence of fault frequency 
46  Innerrace fault (IRF) frequency [39]  \(I_{\text{irf}} = \frac{{N_{b} }}{2}f_{r} \left( {1 + \left( {{{d_{b} \cos \beta } \mathord{\left/ {\vphantom {{d_{b} \cos \beta } {d_{p} }}} \right. \kern0pt} {d_{p} }}} \right)} \right)\)  Occurrence of fault frequency 
47  Roller (ball) fault (BBF) frequency [39]  \(I_{\text{bbf}} = \frac{{d_{p} }}{{d_{b} }}f_{r} \left( {1  \left( {{{d_{b} \cos \beta } \mathord{\left/ {\vphantom {{d_{b} \cos \beta } {d_{p} }}} \right. \kern0pt} {d_{p} }}} \right)^{2} } \right)\)  Occurrence of fault frequency 
48  Cage fault frequency [37]  \(I_{\text{cff}} = \frac{{N_{b} }}{2}f_{r} \left( {1  \frac{{d_{b} \cos \beta }}{{d_{p} }}} \right)\)  Occurrence of fault frequency 
49  Mean frequency [41]  \(I_{\text{meanf}} = \frac{1}{N}\sum\nolimits_{k = 1}^{N} {f_{k} }\)  Main frequency position changes 
50  Variance [29]  \(I_{{\sigma^{2} f}} = \frac{1}{N}\sum\nolimits_{k = 1}^{N} {\left( {f_{k}  I_{\text{meanf}} } \right)^{2} }\)  Frequency quantification related 
51  \(I_{\text{rmsf}} = \sqrt {{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 N}}\right.\kern0pt} \!\lower0.7ex\hbox{$N$}}\sum\nolimits_{k = 1}^{N} {f_{k} }^{2} }\)  Kinetic frequency related  
52  \(I_{\text{fcenter}} = \frac{1}{N}\sum\nolimits_{k = 1}^{N} {f_{k} }\)  Main frequency position changes  
53  \(I_{{{\text{root}}\sigma^{2} f}} = \sqrt {I_{{\sigma^{2} f}} }\)  Convergence of spectrum power  
Envelope spectrum features  
54  RMS frequency of the 1^{st} harmonic [34]  \(I_{\text{rmsf 1h}} = \sqrt {\frac{1}{{b_{1}  a_{1} }}\sum\nolimits_{{k = a_{1} }}^{{b_{1} }} {f_{k} }^{2} }\)  Certain frequency range magnitude 
55  RMS frequency of the 2nd harmonic [34]  \(I_{\text{rmsf 2h}} = \sqrt {\frac{1}{{b_{2}  a_{2} }}\sum\nolimits_{{k = a_{2} }}^{{b_{2} }} {f_{k} }^{2} }\)  Certain frequency range magnitude 
56  RMS frequency of the 3^{rd} harmonic [34]  \(I_{\text{rmsf 3h}} = \sqrt {\frac{1}{{b_{3}  a_{3} }}\sum\nolimits_{{k = a_{3} }}^{{b_{3} }} {f_{k} }^{2} }\)  Certain frequency range magnitude 
Statistical features  
57  T^{2} statistics [37]  \(I_{{{\text{T}}^{ 2} }} = t^{T} Cov^{  1} t\)  None 
58  Q statistics [37]  \(I_{Q} = \varepsilon^{T} \varepsilon\)  None 
59  Residual error matrix [32]  \(I_{rem} = \hat{Y}  Y\)  None 
60  Reconstruction error [38]  \(I_{\text{rec error}} = \left\ {x_{\text{new}}  (WW^{T} )x_{\text{new}} } \right\^{2}\)  None 
61  Bhattacharyya distance [39]  \(\begin{aligned} I_{\text{Bhattacharyya}} = \frac{1}{8}(\mu_{1}  \mu_{2} )^{T} \hfill \\ \left[ {\frac{{C_{1} + C_{2} }}{2}} \right]^{  1} \hfill \\ (\mu_{1}  \mu_{2} ) + \hfill \\ \frac{1}{2}\ln \frac{{\left {{{C_{1} + C_{2} } \mathord{\left/ {\vphantom {{C_{1} + C_{2} } 2}} \right. \kern0pt} 2}} \right}}{{\left {C_{1} } \right^{{\frac{1}{2}}} \left {C_{2} } \right^{{\frac{1}{2}}} }} \hfill \\ \end{aligned}\)  None 
In addition, many techniques have been developed and applied. In the frequency domain [45, 46, 47, 48], the power spectrum analysis, the fast Fourier transform (FFT), the discrete Fourier transform (DFT), the Welch method, and the noise cancellation techniques can be found. In the time–frequency domain [49, 50, 51], wellknown techniques are the shorttime Fourier transform, the Wigner–Ville distribution, the continuous wavelet transform (CWT), the discrete wavelet transform (DWT), and the Wavelet packet transform (WPT).
3 Shallow learning algorithms for REB PHM
This section presents a stateoftheart review of SLbased PHM methods and their application to REB PHM. In attempt to organize and classify the diverse SLbased REB PHM techniques, which may originate from the artificial neural network (NN) or may not, three categories are proposed: statistical approaches, NN approaches, and combined methods. Further, the statistical approaches are subdivided, according to the nature and the task of each algorithm, into LDAbased REB PHM, SVMbased REB PHM, Knearest neighbor (KNN)based REB PHM, extreme learning machines (ELM)based REB PHM, and other nonNN algorithms applicable to REB PHM. The combined methods are the ones that utilize a nonNN algorithm with a NN method, or a NN algorithm with a signal processing technique, or a nonNN algorithm with a signal processing approach.
3.1 Statistical approaches for REB PHM
Several shallow learning algorithms exist that were constructed using a shallow architecture that benefits from the statistical properties of the data and uses this information to classify it to already known group [2]. The following section provides a detailed description of those statistical SLbased REB PHM techniques as applied to REB PHM. The structure typical of each algorithm is briefly introduced, and its application to REB PHM is outlined to highlight its challenges, its pros and cons, and its latest advancements.
3.1.1 LDAbased REB PHM
Between and withinclass scatters of Fig. 5
Projection line  

(a)  (b)  
Betweenclass scatter  Small  Large 
Withinclass scatter  Large  Small 
The LDA algorithm has been used to improve classification of ball bearing faults according to their severity level [53]. LDA was also used as a dimensionality reduction technique to find the dimensions of a few features that best discriminate a set of features extracted from raw vibration signals [54]. Zhao et al. [55] proposed a trace ratio version of LDA, which uses the betweenclass scatter matrix to evaluate the separability of different classes and the withinclass scatter matrix to evaluate the compactness within each class. The extended discriminative subspace learning method was used for dealing with the trace ration problem in linear discriminant analysis for a REB fault detection and diagnosis problem. A trace ratio LDA algorithm was also introduced by Jin et al. [56] and used to reduce the dimension and then to classify the motor bearing health conditions, which arose from singlepoint faults and generalizedroughness faults. Another form of LDA, called ∆LDA, was proposed by Ciabattoni et al. [57] to deal with fault data dimension reduction and fault detection issues with application to REB fault detection. ∆LDA was proposed to overcome the problem of a betweenclass scatter matrix trace very close to zero, which is the case when detecting different bearing faults. It did indeed improve the classification accuracy when the classes were overlapped. Evaluating the current feature generated by frequency selection in the stator current spectrum by means of LDA algorithm, a fault diagnosis of bearing damage was proposed in [58], in which the fault diagnosis was performed by the Bayes classifier.
3.1.2 SVMbased REB PHM
The classifier in machine learning and statistics learns from the data input given to it and then uses this learning to classify a new observation. The same can be done to detect and then diagnose various bearing faults (outerrace fault (ORF), innerrace fault (IRF), and ball bearing fault (BBF) or cage faults). One of the most used classifierbased PHM techniques is the SVM.
Numerous researches have modified SVM algorithms for various reasons. Sugumaran et al. [60] used the SVM and proximalSVM (PSVM) classifiers to find this optimal number of time domain statistical and histogram features of a vibration signal. A hybrid, twostage oneagainstall SVM approach was proposed for REB fault diagnosis in [61] to predict the type of faults more accurately. In the first SVM stage, the vibration signal can be classified into either normal or fault. Then, the fault types are classified in the second SVM stage. In addition, oneclass νSVM, which use only the normal state data, was used in an automatic bearing fault diagnosis [62]. To fully exploit the advantage of SVM, two multilayer kernel learning models, supervised incremental local tangent space alignment (SILTSA)SVM and supervised linear local tangent space alignment (SLLTSA)SVM were proposed in [63] and applied to REB fault diagnosis. The proposed method combines the supervised method with the dimension reduction algorithms (ILTSA and LLTSA) [64]. In addition, to optimize the SVM parameters, which have significant impact on classification performance, an improved ant colony optimization (IACO) algorithm was proposed to determine the parameters, and then the IACOSVM algorithm was applied to rolling element bearing fault detection [65]. More recent studies were performed to further investigate the use of SVM for REB bearing fault detection and diagnosis, including [66, 67, 68, 69, 70].
3.1.3 Knearest neighbor (KNN)based REB PHM
KNN is a nonparametric (i.e., the model structure is determined mainly from the data without any assumptions on the underlying data distribution), lazy algorithm (i.e., as opposed to an eager algorithm, it does not learn discriminative functions but uses all the training data in the classification step) used for classification, in which the existing (historical) data are grouped into several classes to be used to classify the new data. Thus, the main advantages of KNN are that the learning is very simple and easy to interpret (i.e., it has a physical meaning), and it is an effective classification method for noisy training data and complex target function, which makes it a wellsuited algorithm for REB PHM [71]. However, there are also some disadvantages of KNNbased REB PHM. Specifically, since it is a lazy algorithm it needs to store the entire training dataset and thus needs to compare distance values for whole training samples; this is time and powerconsuming.
KNN was used first for fault detection and diagnosis of low speed (≤ 100 rpm) REBs in the year 1992 [72]. A combination of weighted KNN (WKNN) classifiers was proposed by Y. Lei et al., [73] to overcome the two previously mentioned disadvantages of KNNbased REB fault detection and diagnosis. The KNN was also combined with other classification methods to enhance the REB fault detection and diagnosis capability, such as with SVM [74], kernel PCA (KPCA) [75], the fuzzy Cmeans method [76], the binary differential evolution algorithm [77], or the Kstar classifier [78]. More recently, an optimal KNN model was combined with KPCA to deal with bearing fault detection and diagnosis, in which the KNN was optimized using a particle swarm optimization method [79].
3.1.4 Extreme learning machine (ELM)based REB PHM
To the authors’ knowledge, ELM was first applied alone to REB fault diagnosis system by RazaviFar and Saif [81] to deal with the abilities of incremental learning in nonstationary environments and to detect and diagnose bearing faults under the class imbalance condition. The proposed ELM methods adopted: two stateoftheart ensemblebased techniques, Learn ++.CDS (Concept Drift with SMOTE) [82], which was used to overcome the class imbalance issue in nonstationary environments, and the Learn ++.NIE (nonstationary and imbalanced environment) [83] to handle classimbalanced data during the incremental phase in nonstationary environments. A more recent study that used ELM for REB condition monitoring was carried out by W. Mao et al., [84] in which they tried to solve the online imbalanced data problem that occurs when collecting data online in a sequential way and the number of fault data is much less than the number of the normal data.
3.1.5 Other statistical algorithms for REB PHM
Sugumaran et al. [85] investigated the effectiveness of an automatic rule learningbased decision tree for classification when employing a fuzzy classifier. The decision tree was used to select the different extracted statistical features from the vibration signals, and then multiple membership functions based on the generated ‘if–then’ rules were designed. Finally, a fuzzy inference engine was built and used to classify the REB health conditions based on predefined threshold. Then, they [86] proposed a decision tree based method for the use of the histogram features to improve the previous results in the case of small data points in the data set.
Other different nonNN methods were investigated to detect and diagnose an REB’s health state. Yu [87] proposed a supervisedlearningbased local and nonlocal preserving projection (SLNPP) method; Kankar et al. [88] used learning vector quantization (LVQ) as a REB fault classifier. In Cao et al. [89], a novel fault diagnosis method based on semisupervised fuzzy Cmeans (SFCM) cluster analysis was developed; and more recently, targeting the nonstationary and nonGaussian characteristics of a vibration signal from a faulty rolling bearing, Han et al. [90] developed a VMDAR (variational mode decompositionautoregressive) model and investigated diagnosing REB faults using the random forest learning (RFL) classifier. The VMD was applied to decompose vibration signals where a series of stationary component signals were obtained, then, an AR model was established for each component mode. The models were used as fault characteristic vectors. Finally, a novel RFL classifier was considered for pattern recognition to diagnose different bearing faults.
Mohsenzadeh et al. [91] introduced a novel sparse Bayesian learning (SBL) algorithm called the relevance sample feature machine (RSFM), which had the capability of choosing the relevant samples and the relevant features simultaneously for regression or classification problems. Further, it was concluded that the RSFM had the advantage of avoiding overfitting, resulting in less system complexity during the testing stage, and better generalization. Wong et al. [92], successfully adopted a novel structure that is based on a pairwisecoupled sparse Bayesian extreme learning committee machine to intelligently and simultaneously diagnose bearing faults.
A bearing fault diagnosis technique was also presented by Shen et al. [93] based on a transfer learning (TL) technique, which was not limited to the same field [94]; it used singular value decomposition (SVD) [95] as its feature extraction tool. The authors describe the main idea of the proposed TL method as [93] “to utilize selective auxiliary data to assist target data classification, where a weight adjustment between them is involved in the TrAdaBoost algorithm for enhanced diagnostic capability. In addition, negative transfer is avoided through the similarity judgment, thus improving accuracy and relaxing computational load of the presented approach.”
Manifold learning (ML) [96] techniques are widely used in cluster analysis, image processing, bioinformatics, etc., [97, 98, 99]. However, ML techniques are rarely used for fault diagnosis, and were only used as a nonlinear time series noise reduction method applied to the analysis of gearbox vibration signals with snaggletooth in [100]. Recently, Wang et al. [101] proposed a novel machinery REB fault diagnosis approach based on a statistical locally linear embedding (SLLE) manifold learning algorithm, which was an extension of LLE [102]. Another study, which applied the ML technique in combination with wavelet packet transform to detect weak transient signals for REB fault diagnosis, was carried out by Wang et al. [103]. This study proposed an extraction method, named waveform feature manifold (WFM), that used the binary wavelet packet transform to obtain the waveform feature space, which was then used to extract the weak signatures.
It should be noted that there are a few remaining learning techniques, such as the Bayesian learning (BL) [104] technique and the WidrowHoff learning (WHL) [105] algorithm. The authors did not find any study that applied these techniques to the bearing prognostics and health management field, although researchers may consider these techniques in the future.
3.2 Neural network approaches for REB PHM
Other SLbased REB PHM techniques that were constructed using a shallow structure originating from the artificial NN are grouped and reviewed in this subsection. It is worth noting that the deep learning methods originated as an extension of these NNbased techniques.
As stated above, the weights to be used in the network are calculated using forward/backward propagation. For the gradient descent method, computing the error gradient with respect to each weight is needed to quantify the influence of each weight on the final error. Backpropagation is an efficient way to compute gradients of the cost function; it is commonly used to train the network [108]. The backpropagation procedure can be defined as follows: first, initialize the weights randomly, apply the forward propagation (through the neural network, to obtain output & cost), then apply the backward propagation (calculate the influence of each weight on cost; error gradient), and finally, update the weights by repeating those steps until the performance of the network is satisfactory.
Neural network (NN) techniques have been applied to the PHM field for different engineered systems [109, 110, 111]. One of the earliest works that used NN for motor REB fault diagnosis was performed by Li et al. [112]. Frequency domain features extracted from the vibration signal were first performed (i.e., using FFT), then a NN was trained to emulate the knowledge of the vibration experts, which are very expensive. Thus, motor REB fault diagnosis was achieved more efficiently and at a reduced cost. Another study [113] used time domain features (I_{rms}, \(I_{{\sigma^{2} }} ,\)I_{sk}, and I_{kur6}) for artificial NNbased bearing fault diagnosis instead of frequency domain features. Pandya et al. [114] used time–frequency domain features for NNbased REB fault diagnosis. They used the wavelet packet decomposition for feature extraction from the measured vibration signal. A comparison study [115] of three types of artificial NNs, the multilayer perceptron (MLP), the radial basis function (RBF) network, and the probabilistic neural network (PNN), for bearing fault detection was also performed. With the goal of automating the process of feature extraction, fault detection and identification was performed for REMs. A matching pursuit analysis was used to extract time–frequency domain features that were used subsequently as inputs to a feedforward neural network (FFNN) to classify the different bearing conditions (healthy, IRF, ORF, and BBF) [116]. Gebraeel et al. [117] proposed a way to predict the residual life from vibrationbased degradation signals to estimate the bearing failure time. They developed two classes of models—a single bearing and a clustered bearing neural network—to perform REB fault prognosis. Different combinations of time, frequency, time–frequency domain features with an NNbased approach were also carried out to deal with REB fault detection and diagnosis [118, 119, 120]. A nonintrusive artificial NN approach that used stator current signals instead of vibration signals was also previously applied for REB fault detection and diagnosis for a threephase induction motor [121].
Recently, in the last 2 years, a comparative study was published [122], where NNbased REB fault diagnosis was compared to SVMbased REB fault diagnosis; results showed that the latter gave better results than the former. An assessment study of the effect of the NN structure and parameters on REB fault diagnosis was carried out in [123] since no formula exists to select the optimal values of these network characteristics. A hybrid fault diagnosis method for a REB fault in the field of gas turbine health management was investigated in [124]. This hybrid technique combined the Stransform algorithm [125] and the artificial NN method. Their results showed that the Stransform could extract good time–frequency domain features from the raw vibration signals for REB fault detection and diagnosis.
3.3 Combined methods for REB PHM
Summary of the reviewed SLbased REB PHM methods
SLbased REB PHM algorithms  Principle  Pros  Cons  Application  References 

Statistical approaches  
LDAbased  LDA can find a linear combination of features that separates different classes. Its main objectives are either to reduce dimensionality or to perform classification.  Powerful statistical theory Generally outperforms centroid classification Powerful dimensional reduction technique  Difficult to be used for more than twoclass classification, i.e., not suitable for REB fault diagnosis Cannot work properly with nonlinear data Has a high misclassification percentage when the number of trained data is small Struggles when dealing with missing data  Fault detection and diagnosis (FDD)  
Fault prognosis  None  
SVMbased  The basic idea of SVM is to map the nonlinear input data into a feature space first, then the inner product of this feature space is nonlinearly mapped to the original space via kernels. Thus, the main purposes of SVM are classification and estimation  Uses a wellestablished model, thus it could eliminate the need for experimental training data with the specific defective bearing Can handle nonlinear data Faster to be trained compared to NNs  Difficult to be used for more than twoclass classification, i.e., not suitable for REB fault diagnosis Must deal with the optimization of the kernel functions  FDD  
Fault prognosis  None  
KNNbased  KNN is a nonparametric, and lazy algorithm used for classification, in which the historical data are grouped into several classes to be used later to classify the new (unknown) data  Learning is very simple Easy to interpret (i.e., has a physical meaning) Can easily deal with more than two classes, i.e., suitable for REB fault diagnosis Effective classification method for noisy training data and complex target function Can handle nonlinear data  Need to store all the training data, i.e., memoryconsuming Need to compare distance values for whole training samples, i.e., time and powerconsuming It is not robust to outliers Its fault detection and diagnosis accuracy highly depend on determining the parameter k When applied to REB PHM, overfitting problem may occur  FDD  
Fault prognosis  None  
ELM based  ELM uses singlehidden layer feedforward neural network (SLFNN), contrary to FNN that randomly chooses hidden nodes and analytically determines the output weights of the SLFNN  Provides a good generalization performance at an extremely fast learning speed Suitable for realtime REB fault diagnosis Has the ability of incremental learning in a nonstationary environment Can deal with the class imbalance issue in a nonstationary environment  Difficult to be extended to a deep architecture since it has basically only two layers The input weights and biases for hidden nodes are randomly selected, which may cause instability in the output nodes  FDD  
Fault prognosis  None  
Others  Different algorithms were found and grouped here including fuzzy classifier, decision tree, RFL classifier, clustering method (fuzzy Cmeans), etc  Fuzzy classifier efficiently handles uncertainty Decision tree is easy to interpret RFL can deal with nonstationary signals  Fuzzy inference engine needs many data points in the data set Fuzzy classifier requires prior knowledge Complexity of the decision tree  FDD  
Fault prognosis  [87]  
NN approaches  
NNs are nonlinear and multivariable models that can be seen as the reference models for the modelbased PHM techniques and as a classifier for the SLbased PHM methods  Can handle nonlinear data Can easily deal with more than two classes, i.e., suitable for REB fault diagnosis Relatively easy to use  Slow to train (i.e., timeconsuming algorithm) Prior domain knowledge is needed for feature extraction Weak generalization ability Increasing its classification accuracy by a few percent can hugely bump up its scale  FDD  [107], [112, 113, 114, 115, 116], [118, 119, 120, 121, 122, 123, 124]  
Fault prognosis  [117]  
Combined approaches  
Merging the above techniques in the attempts to benefit from the Pros of some or eliminate the Cons of the others  Merging the above techniques to better detect and diagnose the REB faults under the highly nonlinear, nonstationary operating conditions Provides an online REB fault detection and diagnosis technique  The complexity of combined method May be difficult to interpret Merging two or methods may result in a timeconsuming and/or powerconsuming issue  FDD  [127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139], [141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152]  
Fault prognosis 
As a statistical algorithm with an NN method, J. B. Ali et al., [126] combined PCA and LDA with PNN and a simplified fuzzy adaptive resonance theory map (SFAM) neural network for early online diagnosis of naturally progressing bearing degradations. The former was used for feature reduction and the latter were used for classification. An artificial NN method combined with a signal processing technique (i.e., discrete wavelet transform (DWT)) was investigated to detect and diagnose the bearing faults of an industrial robot [127]. An ensemble SVM, as a statistical algorithm, was combined with composite multiscale fuzzy entropy (CMFE), as a signal processing method, for REB fault detection and diagnosis [128]. Other studies [129] and [130], tended to combine the ELM algorithm with other methods to better detect and diagnose the different REB faults. The ELM was used as a classifier and combined with multiscale intrinsic mode function permutation entropy, which extracted feature parameters, after a preprocessing stage to denoise the original vibration signals using Wavelet as the prefilter [129]. Tong et al. [130] proposed a fault diagnosis approach for REBs based on redundant second generation WPT and ELM.
A more recent work that was just published in the current year, 2018, proposed a novel FDD Method for REB based on ensemble local characteristicscale decomposition (ELCD) and the ELM (ELCDELM) algorithm [131]. First, numerous intrinsic scale components (ISCs) were obtained by decomposing the vibration signals using ELCD, and then different ISCs (in the time domain, energy, and relative entropy) were calculated to be the inputs to the ELMbased REB FDD. The proposed ELCDELM was found to be able to process nonstationary vibration signals and overcome modemixing phenomenon of the LCD method.
4 Deep learning for REB PHM
Many SLbased techniques have been applied to the PHM field and investigated to detect, diagnose, and predict (sometimes) rolling element bearing health conditions, as reviewed and summarized in the previous section. Those SLbased techniques achieved decent performance, especially when detecting and diagnosing REB faults. However, few studies that deal with REB fault prognosis were found. Further, from surveying the abovereviewed SLbased REB PHM techniques, it is clear that the performance of those techniques depends greatly on extracting the bestsuited features, which were summarized in Table 1. Given the fact that the SLbased PHM techniques manually design and extract the features, in addition to the variety and the large amount of data in the PHM field, it can be concluded that those SLbased PHM techniques will face significant challenges in actually determining the bestsuited features to be extracted, especially in the big data scenario. Further, the SLbased PHM techniques have other challenges that come from the big data, such as the high dimensionality of feature space, the proliferation of multimodal data, and multicollinearity among data measurements [23]. Moreover, the four phases of the SLbased PHM technique shown in Fig. 2b cannot be optimized simultaneously (i.e., data processing, feature extraction, feature selection, and model training usually are done successively, not at the same time), which boosts the required processing time (i.e., timeconsuming issue) and increases complexity. Therefore, as a technique that has the capability to be a bridge that connects the big data from the machinery and the intelligent machine PHM methods, the DL based PHM technique is being adapted to the REB PHM field. This DLbased REB PHM method is known as a method that classifies different patterns via stacking multiple layers in hierarchical architectures and can model the highlevel representations behind data [21]. Further, the DLbased techniques are gaining popularity even in the PHM field because they can use the raw data directly (without any preprocessing, as shown in Fig. 2c) as an input, i.e., representation learning. They can learn complex and highly nonlinear representations from highdimensional data [153].
Although the deep learning is not a new concept, it has only recently started to gain more attention and to be successfully applied in different fields, such as computer vision, language and audio processing, and (automatic) recognition [153], [154]. It is only in the last few years that deep learning started to be applied to the PHM field [155, 156, 157]. To the authors’ knowledge, the deep learning technique was first applied to rolling element bearing prognostics and health management in the year 2015, except for a few works, such as the one from Liu et al. [158]. Liu et al. used sparse coding with a learned dictionary instead of a predefined one for adaptive feature extraction from the vibration signal for REB fault diagnosis; they introduced a natural extension of sparse coding, the shiftinvariant sparse coding algorithm. In other work, Verma et al. [159] proposed intelligent conditionbased monitoring of REMs using a sparse autoencoder method.
As mentioned in Sect. 3 and following the classification of deep learning methods in Zhao et al. [21], this paper thus classifies and reviews the existing studies on DLbased REB PHM into four groups: (deep) CNNbased, (deep) RNNbased, RBMbased DNN, and AEbased DNN approaches. In addition to briefly introducing the definition and principle of each algorithm with its typical structure, its application to REB PHM is outlined to highlight its challenges, its pros and cons, and its latest advancements.
4.1 (Deep) CNNbased REB PHM approaches
One deep learning technique is the convolutional neural network (CNN) approach. Per its definition, CNN is a feedforward neural network of multiple layers, which assumes inputs as images [160]. It was inspired by neurons of the human visual cortex that have two features [153]. One is local connections, which means that since images have high correlation within subregions, the correlation information is critical in recognizing those images, where the subregions in the previous layer are connected to local patches in the feature maps by filters. The other feature is shared weights, where a pattern can appear in various locations in the images, and by convoluting filters across an image, the pattern can be extracted independent of location. In addition, using the same filter across an image, the number of parameters is reduced significantly. Nowadays, many opensourced CNN models are available (e.g., GoogLeNet, AlexNet) which make them attractive to researchers.
An investigation of the use of the convolutional neural networks (CNN) with a deep structure, from onelayer up to threelayers, on raw signals to test the accuracy of it as a classifier on bearing fault data was proposed in [161], where its effectiveness was investigated when the input signals were corrupted with noise. More works were carried out to deal with REB PHM based on DCNN (deep CNN) [156] or a modified DCNN, i.e., the hierarchical adaptive DCNN [162] and energyfluctuated multiscale feature learning with deep Convnet for intelligent spindle bearing fault diagnosis in [163]. CNNbased bearing fault detection was proposed in [164], which was considered as a featurelearning model for condition monitoring, so that it can autonomously learn useful features for bearing fault detection from the data itself.
Just last year, 2017, several papers were published, [165, 166, 167, 168, 169, 170, 171], that used CNNbased deep learning to deal with detecting and diagnosing REB faults. Thus, it should be noted that there is a clear tendency toward applying such deep learning techniques for REB fault detection & diagnosis tasks; however, no study paper has yet considered the prognostic task—this research still needs to be pursued.
A hybrid method was proposed by You et al. that benefits from the featurelearning capability of the CNN method—as a deep learning technique—and the generalization ability of the support vector regression (SVR) [165]. In [166] the method was used to detect and diagnose different bearing faults as well as gear faults. The proposed hybrid model, CNNSVR, was constructed by replacing the top layer of the traditional CNN with an SVR classifier and then the new model was stacked layerbylayer with convolutional layers and pooling layers inside. The structure of the proposed hybrid model consists of 10 layers totally, including the input layer, three convolutional layers, three pooling layers, two fully connected layers, and a support vector regressive classifier as the top layer. In [167], [168], and [169], the DCNN was applied as a deep learning technique that was combined with other methods to deal with REB FDD. Zhang et al. [167] proposed a novel method named DCNN with wide firstlayer kernels (WDCNN); Fuan et al. [168] utilized DCNN with a particle swarm optimization method and the tdistributed stochastic neighbor embedding (tSNE) technique. Li et al. [169] proposed IDSCNN, which is based on ensemble DCNN and an improved Dempster–Shafer theory based on an evidence fusion technique. Another paper that combined the CNN with a feature extraction algorithm based on EMD method was proposed by Xie and Zhang [170] with attention to extracting distinguishing features (compressed features with spatial information) to solve the nonstationary characteristic in the original vibration signals. Finally, Lu et al. [171] investigated a new hierarchical network of CNNbased deep learning for bearing fault diagnosis under fluctuated working conditions and noisy environments making use of cognitive computing theory.
4.2 (Deep) RNNbased REB PHM approaches
Almost all found papers, [175, 176, 177, 178, 179, 180] used the RNN as a tool not only for REB fault diagnosis, but also for prognosis, except Abed et al. [175]. Abed et al. [175] used dynamic recurrent neural networks (DRNNs) that can learn the dynamics of nonlinear systems, whereas conventional static neural networks cannot. The DRNN was fed with the orthogonal fuzzy neighborhood discriminant analysis (OFNDA) features to be applied for realtime REB FDD.
Malhi et al. [176] preprocessed vibration signals from defectseeded REB using CWT and then used a competitive learningbased approach based on the RNN algorithm for longterm prognosis. Different statistical parameters were utilized as inputs to the RNN, which were clustered based on the principle of competitive learning to effectively represent the bearing defect propagation. The results showed that the RNN did not work well in a shortterm prediction case, but for longterm prediction, the RNN did increase the training speed and achieved good prognostic results. Sharma et al. [177] proposed a robust fault analysis method to diagnose and predict the level of fault severity of a REB. They used DWT for feature extraction, and an orthogonal fuzzy neighborhood discriminative analysis (OFNDA) technique for feature reduction. Finally, a DRNN method was used to predict the REB conditions and classify their different faults. Xie and Zhang [178] used two methods, echo state network (ESN) and recurrent multilayer perceptron (RMLP), which are functionals of RNN, for vibrationbased REB fault prognosis. The two methods used were able to predict the REB health condition in a relatively short time and with only limited data available, contrary to the autoregressive moving average (ARMA) and SVM methods. More recently, the RNN was used as the main tool for REB prognosis [179], [180], where in [180] it was applied in the time and the frequency domains; test results showed that the RNN can be used to do fault prognosis in general, and especially for bearing health conditions. These prior studies showed promising results regarding the ability of RNNs to predict the RUL, is an important factor for decisionmaking to alleviate emergency situations. Thus, the use of an RNN for REB fault prognosis is worth further indepth study.
4.3 RBMbased DNN approaches for REB PHM
Deep neural networks (DNNs) belong to the category of artificial NNs, but they are generally superior since they are known to have strong power for learning representation. A DNN that builds an architecture using a deep learning technique, which is a layerbylayer learning technique, has the ability to deal with the issue of a localoptimal to train the parameters of the network [111]. A deep DNN structure can be built either by the restricted Boltzmann machine (RBM) or by the autoencoder (AE) technique. In the next two subsections, research on RBMbased DNN and the AEbased DNN for REB PHM is reviewed, respectively. Thus, a brief description of RBM is presented first, with variant models that used it as the basic learning module, i.e., deepbelief networks (DBN) and the deep Boltzmann machine (DBM). Then, a comprehensive review of existing studies examining RBMbased DNN for REB PHM is presented.
Stacking multiple RBMs, the DBN is constructed, as can be seen in Fig. 12b. Thus, the DBN is a NN of multiple layers that has stochastic latent variables (hidden units) and a generative graphical model [182]. The DBN has two steps of training, first an unsupervised layerwise pretraining (RBM 1, RBM 2, and RBM 3 in Fig. 12b), and then a supervised finetuning (fully connected (FC) layer in Fig. 12b). In pretraining, each hidden layer serves as the visible layer for the next layer.
In contrast to a DBN, the DBM is built by grouping hidden units into a hierarchy of layers instead of a single one. Thus, the DBM is simply a deep structured RBM, where any adjacent layers can be connected, but nonadjacent layers cannot be connected. In addition, no connection is permitted within units of the same layer. The DBM adopts learning a complex, fully connected Boltzmann machine, in which each layer captures complicated, higher order correlations between the activities of hidden features in the layer below [183].
First, the RBMbased DNN structure that uses RBM as the basic learning module, i.e., the deepbelief network (DBN), was employed as a bearing condition monitoring tool to overcome the presence of noise and transient impacts in the acquired vibration signals in [184]. Another research paper that uses an optimization DBN method to deal with REB fault diagnosis was achieved by Shao et al. [185]. Different research works have been performed by combining the DBN with other techniques to improve the REB detection, diagnosis, and prognosis capability. Wang et al. [186] proposed a bearing fault diagnosis method based on the Hilbert envelope spectrum and a DBN. Getting the right parameters of the DBN is crucial, however, it can be timeconsuming due to the training process. Thus, a research study in [187] was proposed to deal with this issue and to avoid both the overfitting and the underfitting problems. An assessment of the bearing degradation based on the Weibull distribution and a DBN was investigated by Ma et al. [188]. Bearing fault diagnosis based on a DBN and multisensor information fusion techniques was carried out based on use of multivibration signals to adaptively fuse multifeature data and identify various bearing faults [189]. Yin et al. [190] developed a combined machine health assessment model based on an Isomap and a DBN, which effectively evaluated the degradation of the bearing health conditions, since it was found to be more sensitive to the incipient faults. A twolayer hierarchical diagnosis network (HDN) [191] that deals with REB diagnosis in two stages was carried out using a wavelet packet energy feature. The bearing fault types were identified by the first layer, then their severity ranking was recognized in the second layer. Finally, the HDN was compared to two similar networks constructed by SVM, and to a backpropagation neuron network (BPNN); according to the experimental results, it could deal with the presence of noises and disturbances that gave rise to the overlapping problem among the different fault classes and was more reliable for precise, multistage diagnosis.
One critical challenge for performing prognosis of bearings in the era of the IoT and 4th Industrial Revolution is to automatically process massive amounts of data and accurately predict the RUL of bearings. Recently, a study of Deutsch et al. [192] addressed the limitations of SLbased REB prognostics, and presented a new method that integrates a DBN and a particle filter for RUL prediction of hybrid ceramic bearings; the study then compared the results with DBN and particle filterbased approaches. The validation and comparison results showed promising RUL prediction performance of the integrated method. Early bearing fault diagnosis using effective feature selection methods was proposed by Devendiran et al. [193]; these researchers used a DBN as one of the neural network classification algorithms. In contrast to the conventional fault diagnosis and classification methods, which usually do not consider the temporal coherence of time series data, Zhang et al. [194] proposed a REB FDD model based on a DBN. It can directly recognize raw time series sensor data without feature selection and signal processing. It also takes advantage of the temporal coherence of the data, thus, expertise in feature selection and signal processing is not required.
In the current year, 2018, three papers have already been published. All of them used the DBNbased deep learning method to deal with REB PHM. In [195], in contrast to the shallow learning methods, which require establishing explicit model equations and much prior knowledge (and therefore are limited in the age of big data as explained in the previous section), this paper presented a deep learningbased approach for RUL prediction of rotating components with big data. The developed deep learningbased approach was a DBNfeedforward neural network (DBNFNN) algorithm that takes advantage of the selftaught, featurelearning capability of the DBN and the predicting power of the FNN; together, these strategies overcome the abovementioned limitations. A novel convolutional deepbelief network (CDBN) was proposed for REB PHM in [196]. First, an autoencoder was used to compress data and reduce the dimension. Second, a novel CDBN was constructed with Gaussian visible units to learn the representative features. Finally, the exponential moving average (EMA) was considered to improve the performance of the constructed deep model. Another study was performed by Oh et al. [197] where the researchers developed a DBNbased deep learning method with vibration images as the inputs. The developed method was found scalable, due to the fact that the vibration imaging approach devised incorporates data from systems with various scales, such as small testbeds and real fielddeployed systems. Further, the method was proposed for unsupervised feature engineering. The proposed DBNbased deep learning algorithm was pretrained for highlevel feature extraction, where a large amount of field data without any label can be incorporated since pretraining can be achieved in an unsupervised manner. Then, the pretrained DBN was finetuned by combining it with a multilayer perceptron (MLP), leading to a fault classifier. The pretrained DBN could also be used as a fault cluster by combining it with a selforganizing map (SOM).
Second, an RBMbased DNN structure that stacks multiple RBMs, i.e., deep Boltzmann machine (DBM), was investigated and used for REB condition monitoring in [198]. In the study, several time, frequency, and time–frequency domain features were extracted from an acquired data set with seven fault patterns to assess the performance of the proposed DBM for REB fault diagnosis. The seven parameters were used as the input parameters of the DBM model. Their results clarify the accuracy and reliability of the DBM model. An enhanced RBM was considered with prognosability regularization for prognostics and health assessment of the REBs. The proposed DBM method was benchmarked with deep structure of the regular RBM algorithm and the PCA [199]. A scoring method based on the benchmarking score was used to evaluate each PHM method in its ability to predict the RUL. He et al. [200] proposed a novel bearing diagnosis method based on the Gaussian restricted Boltzmann machine (Gaussian RBM) algorithm using vibration signal data. The envelope spectrums were used directly as the feature vectors to represent the fault types of the bearing and then classified using the proposed Gaussian RBM algorithm.
The deepstatistical feature learning (DSFL) of the machinery condition health monitoring can be constructed by GaussianBernoulli deep Boltzmann machine (GDBM) [201], where in the GDBM, each neuron in the intermediate layers is connected to both topdown and bottomup information, unlike in other RBMbased deep models, such as the DBN and the deep autoencoder. For deep learning of statistical features with unknown value boundaries, realvalue GaussianBernoulli restricted Boltzmann machines (GRBMs) were stacked to develop the GDBMbased DSFL method in [202] and were applied for both bearing and gearbox systems. Deutsch and He [203] dealt with bearing RUL prediction with big data based on a deep learning technique that used the DBM, which predicts L steps ahead in the future, to predict the RUL by predicting the RMS values and the time of the bearing’s failure.
4.4 AEbased DNN approaches for REB PHM
The AEbased trained layers can be stacked into a new network, which is an AEbased DNN. By training and stacking various layers of AEs, diverse structures of AEbased DNN can be generated to extract features that present health states of various engineered systems as shown in Fig. 13b. Moreover, a deeper layered structure than the one in Fig. 13a is another widely used form of AEbased DNN, which can discover highly representative features from extremely complexed signals.
The AE was applied to rolling element bearing fault diagnosis by W. Lu et al. [205] to extract the features from the raw signal and guarantee sensitivity to every interested fault category to avoid incomplete diagnosis results and the appearance of unknowncategory faults. Six classes of the different bearing faults were considered to evaluate the proposed method after data preprocessing using the FFT to generate a 600 points length. Therefore, the built DNN was a twolayer structure with 800 and 400 neurons in the hidden layers, which were constructed by an AE. The authors concluded that the constructed DNNbased AE could extract useful features and further studies should be carried out to better classify the fault categories with high accuracy and to handle the unknowncategory fault cases. F. Jia et al., [206] stated that the AEbased DNN method could overcome the two issues hindering ANNbased intelligent fault diagnosis of rotating machinery—1) the need for prior expertise and knowledge to manually extract fault features, and 2) the limitation in learning the complex nonlinear relationships in fault diagnosis. Thus, F. Jia et al., [206] suggested a promising tool for fault characteristic mining and intelligent diagnosis of rotating machinery with massive data, based on AE with deep architectures, instead of shallow ones. Another study [207], took advantage of the learning capability of the AE and combined it with the digital wavelet frame (DWF) and nonlinear soft threshold method to denoise the fault vibration signal first. The study applied a stacked autoencoder (SAE) to extract the features, which were the inputs to a BP network classifier. A method was employed to deal with extracting features from the stationary and nonlinear characteristics of bearing deep learning stacked denoising autoencoder (SDAE) [208] nonvibration signals. The deep learning SDAE, combined with dropout, was found to be useful to learn good representations of those features and improve fault pattern classification robustness.
The AEbased DNN was also used to classify REB fault classes under two and three hidden layers in an unsupervised manner using the encoder part of the AE [209]. L. Guo et al., [210] suggested a new multifeature extraction and a nonlinear dimension reduction algorithm based on deep learning for bearing condition recognition. Different time domain, frequency domain, and time–frequency domain features were calculated and then their dimension was reduced. Finally, the different bearing faults were classified using a toplayer classifier of AEbased DNN outputs.
S. Tao et al. [211] combined the SAE with the softmax regression method, which is a classification method that generalizes logistic regression to multiclass problems, for examining the bearing fault diagnosis problem. Their results showed that combining SAE with the softmax regression method had a strong robustness and eliminated the impact of noises remarkably. H. Liu et al., [212], for the first time, combined the shorttime Fourier transform (STFT) and the stacked sparse autoencoder (SSAE) with the softmax regression to automatically extract the features from the sound signals and classify the different REB fault modes, respectively. The effectiveness of the proposed STFTSSAE method was investigated and compared to empirical mode decomposition (EMD), Teager energy operator (TEO), and SSAE to evaluate its performance in deploying the PCA technique for dimensionality reduction. Taking the advantage of the high training speed of the ELM method and the AE extraction capability, another deep learning algorithm named the “AEELMbased diagnosis method” was proposed by Mao et al. [213] to build a universal extraction and a fasttrained method to deal with the different REB diagnosis issues. Highlevel features, which were extracted in the frequency domain and the Wavelet packet transform domain, were extracted in [214] using a DNN. First, the weights of the DNN were initialized using Stacked Denoising Sparse Autoencoders (SDSAE), then those weights were finetuned based on the softmax regression and the centering towards the median. These highlevel features were then classified using the SVM and the random forest technique simultaneously. In [215], the SDAE was also applied to denoise random noises in the raw signals and to represent fault features in fault pattern diagnosis for both bearing rolling faults and gearbox faults. The SDAE was trained in a greedy, layerwise fashion. The proposed method was compared to the DBN algorithm in a highly noisy environment and the results showed its superiority for fault diagnosis.
More recent works, just accepted or lately published in last/current year, 2017/2018, have been conducted to investigate the AE method or its varieties for use in REB fault diagnosis. Chen and Li [216] proposed a multisensor feature fusion for bearing fault diagnosis using sparse auto encoder and a DBN. Lu et al. [217] solved the health state identification problem in REB fault diagnosis using a SDAE method. Another deep learning method named “automated AE correlationbased (AEC)” was developed by Hasani et al. [218]; it was used for health monitoring and for prognostics of machine bearings. A hybrid feature pool method was proposed in [219] that was combined with SAEbased DNNs to perform effective diagnosis of REB faults of multiple severities. The authors found that the hybrid feature pool could extract more discriminating information from the raw vibration signals to overcome the nonstationary behavior of the signals caused by multiple crack sizes; the proposed method outperforms the SVM and the BPNN. In [220], a locality preserving projection (LPP) was adopted to fuse the deep features, and thus to build a new deep AE method constructed with a denoising autoencoder (DAE) and a contractive autoencoder (CAE) for the enhancement of featurelearning ability with the goal of diagnosing REB faults. A hybrid deep model consisting of a multichannel CNN followed by a stack of denoising autoencoders (MCNNSDAE) was developed by A. Shaheryar et al., [221] for fault identification in rotary machines. In the study, these researchers explored the MCNN for unsupervised feature learning on vibration signals and SDAE for extracting vibration features that are robust and invariant to the noises in vibration signals.
Summary of the reviewed DLbased REB PHM methods
DLbased REB PHM algorithms  Principle  Pros  Cons  Application  References 

(Deep) CNN approach  CNN is simply a feedforward NN of multiple layers, which assumes inputs as images  Less complexity in terms of the required number of neurons compared to the artificial NN Many open networks are available: GoogLeNet, AlexNet, VGG, and Clarifai [108], [153]. Can handle nonlinear data and noisy signals  High network complexity (i.e., many layers) is needed to model high hierarchical training data High computational cost Weak generalization ability  FDD  [161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172] 
Fault prognosis  None  
(Deep) RNN approach  RNN is a deep NN structure that applies the same weights recursively over structured inputs  Can handle nonlinear data Powerful in analyzing sequential information A very deep network in which the current output depends on all the past data Work well with shortterm information  Can frequently face a vanishing gradient problem during backpropagation for model training Can face difficulties when dealing with longterm information Huge amounts of data are needed for training  FDD  
Fault prognosis  
RBMbased DNN approaches  
DBNbased  The DBN is a NN of multiple layers, which has stochastic latent variables (hidden units) and a generative graphical model; it is constructed by stacking multiple RBMs  Can deal with the issue of localoptimal to fix the parameters of the network when benefiting from a regularization method such as dropout, L2 regularization, etc Has strong power of representation Does not need much prior knowledge or much expert knowledge Can handle nonlinear data Considers the temporary coherence of time series data  Significant computations are needed, especially in the training procedure that requires initialization and sampling Timeconsuming due to the optimization process  FDD  [185, 186, 187, 188], [190], [192], [193], [195], [196], [198], [199] 
Fault prognosis  
DBMbased  DBM is constructed by grouping hidden units into a hierarchy of layers instead of a single one. The DBM thus is simply a deep structured RBMS  Can handle nonlinear data Robust when dealing with obscured data Topdown feedbacks are integrated  Time complexity for the inference is higher than of that of a DBN [212] Cannot handle big data well, especially during the optimization of the network parameters  FDD  
Fault prognosis  
AEbased approaches and its variants  
AE is a feedforward NN with an unsupervised machine learning structure that aims at predicting accurately the output. Further, it reduces the data dimensionality during the encoder phase  Can extract features from raw signal and guarantee sensitivity to each considered fault, thus avoiding incomplete diagnosis results and the appearance of the unknowncategory faults An unsupervised learning technique Does not need much prior knowledge or much expert knowledge Can handle directly complex nonlinear data  During the propagation, errors can appear Requires a pretraining phase The possibility of the sparse representation  FDD  [158], [207], [208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219], [221, 222, 223, 224, 225]  
Fault prognosis  [220] 
5 Summary and concluding remarks
This paper has presented a comprehensive review and summary of recent techniques aimed at REB fault detection, diagnosis, prognosis, and their applications. Thus, this paper attempts to elegantly represent the widespread, contemporary REB PHM techniques by considering two main categories, shallow learning algorithms and deep learning methods. First, the different bearing failure modes were briefly described, focusing on fatigue, wear, plastic deformation, corrosion, electrical erosion, and fracture & cracking modes. Then, the different health features (indexes, criteria), which are used by these contemporary REB PHM techniques, were thoroughly described (with their physical meaning where applicable—some of the features do not have any) to provide an overall background for researchers, system engineers, and experts—in the general PHM field and in the specific REB PHM field—to select and adopt the best fit for their specific applications.
Several SLbased algorithms were found and were applied to REB PHM systems; some originated from artificial neural networks (NN), some did not. Thus, three categories were proposed in this paper: (1) Statistical approaches, which were divided to LDAbased REB PHM, SVMbased REB PHM, KNNbased REB PHM, ELMbased REB PHM, and other statistical algorithms for REB PHM; (2) NN approaches; and (3) combined methods. Further, DLbased REB PHM techniques were also reviewed and classified into four groups in this paper, as follows: (1) (Deep) CNN methods; (2) (Deep) RNN methods; (3) RBMbased DNN methods—subdivided into DBNbased REB PHM and DBMbased REB PHM; and (4) AEbased DNN methods. Furthermore, the principle, the pros and cons of these SLbased REB PHM, and DLbased REB PHM methods, and their advancements and applications were reviewed and summarized.

Although both SLbased and DLbased REB PHM techniques achieve good results in detecting and diagnosing different REB faults (sometimes they achieve perfect results with 100% accuracy), they are still not adopted in industry due to a lack of studies that consider how these contemporary techniques (i.e., SLbased and DLbased) will be applied in practice. Thus, it will be very interesting if academics and industrial experts work together to adopt and study these strategies. To consider different scales, different fault modes (i.e., a single failure mode as well as compound failures), and different bearing types, such as journal bearings and magnetic bearings that are becoming more incorporated in realworld applications nowadays should be studied. Furthermore, it is recommended that companies and industry experts to share their data healthy, faulty, and runtofailure data—with academics, who usually use only data collected from their inlab test bench; this shared data would help to achieve better advancements not only in research, but also for practical industry use.

It is well known in the PHM field that if an accurate enough reference model exists, using a modelbased technique for detecting, diagnosing, and predicting the faults is the best choice. Thus, incorporating dynamic models of REBs could improve the accuracy of the REB PHM methods. Further, since fault data are very rare and hard to get from modern engineered systems, researchers can benefit from the recently developed generative adversarial networks (GAN) technique for generating faulty data.

Although there have been significant advancements in the development of both the SLbased REB PHM techniques and the DLbased REB PHM techniques, there is still no formula or law that exists to select the optimal values of the network geometry or hyperparameters (e.g., number of layers) to achieve the best results in detecting, diagnosing the bearing faults, and (ultimately) predicting health conditions. Thus, providing a standardized platform or at least a streamline of how deep those algorithms should be, with consideration of the fact that most companies lack software, modeling, and expertise to understand deeply those algorithms and to interpret their results, will enable integration of these contemporary techniques into realworld applications.

Finally, nearly all existing REB PHM, whether based on shallow learning or deep learning techniques, have targeted only the REB fault detection and diagnosis (condition monitoring) problem. Very few studies were found that deal with the REB prognosis with the aim of predicting the remaining useful lifetime (RUL) with the goal of providing a better conditionbased maintenance (CBM) strategy. Strategies that begin to enable improved CBM will be of great interest to the rolling element bearing PHM field in particular, and to PHM for any modern engineered system in general, especially in the forthcoming years in the age of IoT and big data.
Notes
Acknowledgements
This research was supported by Korea Electric Power Corporation (R17TH02), the Basic Research Lab Program through the National Research Foundation of Korea (NRF) grant funded by the Ministry of Science and ICT (MSIT) (No. 2018R1A4A1059976), and a grant from the Institute of Advanced Machinery and Design at Seoul National University (SNUIAMD).
References
 1.V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare, A. Graves, M. Riedmiller, A.K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. Hassabis, Humanlevel control through deep reinforcement learning. Nature 518, 529–533 (2015)Google Scholar
 2.V. Venkatasubramanian, R. Rengaswamy, S.N. Kavuri, K. Yin, A review of process fault detection and diagnosis Part III: process history based methods. Comput. Chem. Eng. 27(3), 327–346 (2003)Google Scholar
 3.S.X. Ding, P. Zhang, T. Jeinsch, E.L. Ding, P. Engel, W. Gui, A survey of the application of basic datadriven and modelbased methods in process monitoring and fault diagnosis. Preprints of the 18th IFAC World Congress Milano (Italy) (2011), pp. 12380–12388Google Scholar
 4.S. Yin, S.X. Ding, X. Xie, H. Luo, A review on basic datadriven approaches for industrial process monitoring. IEEE Trans. on Industrial Electronics 61(11), 6418–6428 (2014)Google Scholar
 5.C. Hu, B.D. Youn, P. Wang, Timedependent reliability analysis in operation: prognostics and health management. in Engineering Design Under Uncertainty and Health Prognostics. Springer Series in Reliability Engineering (Springer, Cham, 2019). pp. 233–301Google Scholar
 6.C. Hu, B.D. Youn, P. Wang, Case studies: prognostics and health management (PHM). in Engineering Design under Uncertainty and Health Prognostics. Springer Series in Reliability Engineering (Springer, Cham, pp. 303–342), 2019Google Scholar
 7.I. Shin, J. Lee, J.Y. Lee, K. Jung, D. Kwon, B.D. Youn, A framework for prognostics and health management applications toward smart manufacturing systems. International Journal of Precision Engineering and ManufacturingGreen Technology 5, 519–538 (2018)Google Scholar
 8.C. Hu, B.D. Youn, P. Wang, J.T. Yoon, Ensemble of DataDriven Prognostic Algorithms for Robust Prediction of Remaining Useful Life. Reliability Engineering and System Safety 103, 120–135 (2012)Google Scholar
 9.G. Niu, J. Jiang, B.D. Youn, M. Pecht, Autonomous health management for PMSM rail vehicles through demagnetization monitoring and prognosis control. ISA Trans. 72, 245–255 (2018)Google Scholar
 10.C. Hu, B.D. Youn, P. Wang, Engineering Design Under Uncertainty and Health Prognostics (Springer, Cham, 2018). ISBN 9783319925721zbMATHGoogle Scholar
 11.A. Rai, S.H. Upadhyay, A review on signal processing techniques utilized in the fault diagnosis of rolling element bearings. Tribol. Int. 96, 289–306 (2016)Google Scholar
 12.S. Choi, B. Akin, M. Rahimian, H. Toliyat, Performanceoriented electric motors diagnostics in modern energy conversion systems. IEEE Trans. Ind. Elect. 59(2), 1266–1277 (2012)Google Scholar
 13.C. Lanham, Understanding the tests that are recommended for electric motor predictive maintenance. Baker Instrument Company (2002)Google Scholar
 14.S. Nandi, H. Toliyat, X. Li, Condition monitoring and fault diagnosis of electrical motors Areview. IEEE Trans. Energy Convers. 20(4), 719–729 (2005)Google Scholar
 15.IEEE Motor Reliability Working Group, Report of large motor reliability survey of industrial and commercial installations. IEEE Trans. Industrial Appl. 21(4), 853–872 (1985)Google Scholar
 16.W. Zhou, T.G. Habetler, R.G. Harley, Bearing condition monitoring methods for electrical machines: a general review. Proc. IEEE SPEEDAM, 6–8 (2007)Google Scholar
 17.M. Hamadache, D. Lee, K.C. Veluvolu, Rotor speedbased bearing fault diagnosis (RSBBFD) under variable speed and constant load. IEEE Trans. Ind. Electro. 62(10), 6486–6495 (2015)Google Scholar
 18.S.X. Ding, Modelbased fault diagnosis techniques: design schemes, algorithms and tools (Springer, Germany, 2008)Google Scholar
 19.S. Schuet, D. Timuçin, K. Wheeler, Physicsbased precurs or wiring diagnostics for shieldedtwistedpair cable. IEEE Trans. on Instrum. and Measurement 64(2), 378–391 (2015)Google Scholar
 20.J. Liu, W. Luo, X. Yang, L. Wu, Robust modelbased fault diagnosis for PEM fuel cell airfeed system. IEEE Trans. on Industrial Electronics 63(5), 3261–3270 (2016)Google Scholar
 21.R. Zhao, R. Yan, Z. Chen, K. Mao, P. Wang, R.X. Gao, Deep learning and its applications to machine health monitoring: a survey. Journal of Latex Class Files 14(8), 1–13 (2015)Google Scholar
 22.G. Zurita, V. Sánchez, D. Cabrera, A review of vibration machine diagnostics by using artificial intelligence methods. Investigación & Desarrollo 1(16), 102–114 (2016)Google Scholar
 23.J. Wang, Y. Ma, L. Zhang, R.X. Gao, D. Wu, Deep learning for smart manufacturing: methods and applications. J. Manuf. Syst. 13. Available online Jan. 2018 (In Press)Google Scholar
 24.B. Sung, J. Lee, Reliability improvement of machine tool changing servo motor. Journal of International Council on Electrical Engineering 1(1), 28–32 (2011)Google Scholar
 25.J. Slavic, A. Brkovic, M. Boltezar, Typical bearingfault rating using force measurementsapplication to real data. J. Vib. Control 17(14), 2164–2174 (2011)Google Scholar
 26.Emerson, Bearing failure analysis, ebook. http://www.emersonbearing.com/bearingfailuremodes (2017)
 27.ISO 15243 Rolling bearings: damage and failures—terms, characteristics and causes (2004)Google Scholar
 28.P.P. Kharche, S.V. Kshirsagar, Review of fault detection in rolling element bearing. Int. J. Innov Res Adv Eng (IJIRAE) 1(5), 169–174 (2014)Google Scholar
 29.S. Devendiran, K. Manivannan, S.C. Kamani, R. Refai, An early bearing fault diagnosis using effective feature selection methods and data mining techniques. Int. J. Eng. Technol. (IJET) 7(2), 583–598 (2015)Google Scholar
 30.L.S. Dhamande, M.B. Chaudhari, Compound gearbearing fault feature extraction using statistical features based on timefrequency method. Measurement 125, 63–77 (2018)Google Scholar
 31.L. Gelman, T.H. Patel, B. Murray, A. Thomson, Rolling bearing diagnosis based on the higher order spectra. Int. J. Prog. Health Manag. 022 (2013) (ISSN 21532648)Google Scholar
 32.M. Hamadache, D. Lee, Principal component analysis based signaltonoise ratio improvement for inchoate faulty signals: application to ball bearing fault detection. Int. J. Control Autom. Syst. 15(2), 506–517 (2017)Google Scholar
 33.J. Lin, Q. Chen, Fault diagnosis of rolling bearings based on multifractal detrended fluctuation analysis and Mahalanobis distance criterion. Mech. Syst. Signal Proc. 38(2), 515–533 (2013)Google Scholar
 34.P.H. Nguyen, J.M. Kim, Multifault diagnosis of rolling element bearings using a wavelet kurtogram and vector medianbased feature analysis. Shock Vib. 215, 14 (2015). Article ID 320508 Google Scholar
 35.W. Li, M. Qiu, Z. Zhu, B. Wu, G. Zhou, Bearing fault diagnosis based on spectrum images of vibration signals. Meas. Sci. Technol. 27(035005), 10 (2016)Google Scholar
 36.B. Attaran, A. Ghanbarzadeh, Bearing Fault Detection Based on Maximum Likelihood Estimation and Optimized ANN Using the Bees Algorithm. Journal of Applied and Computational Mechanics 1(1), 35–43 (2015)Google Scholar
 37.M. Hamadache, Rotor speed based bearing fault diagnosis using absolute value PCA, PhD Thesis, School of electronics Engineering, Kyungpook National University, (2015), pp. 50–54Google Scholar
 38.G. Georgoulas, G. Nikolakopoulos, Bearing fault detection and diagnosis by fusing vibration data. in IECON 201642nd Annual Conference of the IEEE Industrial Electronics Society, IECON 2016—42nd Annual Conference of the IEEE, (2016), pp. 6955–6960Google Scholar
 39.J. Harmouche, C. Delpha, D. Diallo, Improved fault diagnosis of ball bearings based on the global spectrum of vibration signals. IEEE Trans. Energy Convers. 30(1), 376–383 (2015)Google Scholar
 40.J. Park, M. Hamadache, J.M. Ha, Y. Kim, K. Na, B.D. Youn, A positive energy residual (per) based planetary gear fault detection method under variable speed conditions. Mechanical Systems and Signal Processing 117, 347–360 (2019)Google Scholar
 41.J.M. Ha, J. Park, K. Na, Y. Kim, B.D. Youn, Toothwise fault identification for a planetary gearbox based on a health data map. IEEE Trans. on Ind. Electronics 65(7), 5903–5912 (2018)Google Scholar
 42.J.H. Jung, B.C. Jeon, B.D. Youn, M. Kim, D. Kim, Y. Kim, Omnidirectional regeneration (ODR) of proximity sensor signals for robust diagnosis of journal bearing systems. Mechanical Systems and Signal Processing 90, 189–207 (2017)Google Scholar
 43.C. Hu, P. Wang, B.D. Youn, W. Lee, J.T. Yoon, Copulabased statistical health grade system against mechanical faults of power transformers. IEEE Trans. Power Deliv. 27(4), 1809–1819 (2012)Google Scholar
 44.B.D. Youn, B.C. Jeon, J.H. Jung, Apparatus and method for diagnosing rotor shaft. US Patent App. 15/239,987, June 2017Google Scholar
 45.J.M. Ha, H. Oh, J. Park, B.D. Youn, Classification of operating conditions of wind turbines for a classwise condition monitoring strategy. Renewable Energy 103, 594–605 (2017)Google Scholar
 46.W. Zhou, T.G. Habetler, R.G. Harley, Bearing fault detection via stator current noise cancellation and statistical control. IEEE Trans. Ind. Electr. 55(12), 4260–4269 (2008)Google Scholar
 47.H. Zoubek, S. Villwock, M. Pacas, Frequency response analysis for rollingbearing damage diagnosis. IEEE Trans. on Ind. Electr. 55(12), 4270–4276 (2008)Google Scholar
 48.M. Kang, J. Kim, L.M. Wills, J.M. Kim, Timevarying and multiresolution envelope analysis and discriminative feature analysis for bearing fault diagnosis. IEEE Trans. on Ind. Electr. 62(12), 7749–7761 (2015)Google Scholar
 49.F. Zhang, T. Zhang, H. Yu, A Novel rolling bearing fault diagnosis method. in 2016 9th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISPBMEI) (2016) pp. 1148–1152Google Scholar
 50.D. Rossetti, Y. Zhang, S. Squartini, S. Collura, Classification of bearing faults through timefrequency analysis and image processing. in 2016 17th International Conference on MechatronicsMechatronika (ME) Google Scholar
 51.Z. Huo, Y. Zhang, P. Francq, L. Shu, J. Huang, Incipient fault diagnosis of roller bearing using optimized wavelet transform based multispeed vibration signatures. IEEE Access 5, 19442–19456 (2017)Google Scholar
 52.A.A. Krishnamurthy, M.N. Belur, D. Chakraborty, Comparison of various linear discriminant analysis techniques for fault diagnosis of Reusable Launch Vehicle. in 2011 50th IEEE Conference on Decision and Control and European Control Conference (CDCECC), Orlando, FL, USA, pp. 3050–3055 (2011)Google Scholar
 53.J. Harmouche, C. Delpha, D. Diallo, Linear discriminant analysis for the discrimination of faults in bearing balls by using spectral features. in 2014 First International Conference on Green Energy ICGE 2014, (2014), pp. 182–187Google Scholar
 54.T. Liu, J. Chen, X.N. Zhou, W.B. Xiao, Bearing performance degradation assessment using linear discriminant analysis and coupled HMM. J. Phys: Conf. Ser. 364(012028), 12 (2012)Google Scholar
 55.M. Zhao, X. Jin, Z. Zhang, B. Li, Fault diagnosis of rolling element bearings via discriminative subspace learning: visualization and classification. Expert Syst. Appl. 41, 3391–3401 (2014)Google Scholar
 56.X. Jin, M. Zhao, T.W.S. Chow, M. Pecht, Motor bearing fault diagnosis using trace ratio linear discriminant analysis. IEEE Tran. on Ind. Electr. 61(5), 2441–2451 (2014)Google Scholar
 57.L. Ciabattoni, G. Cimini, F. Ferracuti, A. Freddi, G. Ippoliti, A. Monteri`u, A novel LDAbased approach for motor bearing fault detection. in 2015 IEEE 13th International Conference on Industrial Informatics (INDIN) (IEEE, 2015), pp. 771–776Google Scholar
 58.C.P. Mbo’o, K. Hameyer, Fault diagnosis of bearing damage by means of the linear discriminant analysis of stator current features from the frequency selection. IEEE Trans. Ind. Appl. 52(5), 3861–3868 (2016)Google Scholar
 59.W. Yan, H. Shao. Application of support vector machine nonlinear classifier to fault diagnoses. in Proceedings of the 4th World Congress on Intelligent Control and Automation, 2002, vol. 4, (IEEE, 2002), pp. 2697–2700Google Scholar
 60.V. Sugumaran, K.I. Ramachandran, Effect of number of features on classification of roller bearing faults using SVM and PSVM. Expert Syst. Appl. 38(4), 4088–4096 (2011)Google Scholar
 61.K.C. Gryllias, I. Ioannis, A. Antoniadis, A support vector machine approach based on physical model training for rolling element bearing fault detection in industrial environments. Eng. Appl. Artif. Intell. 25(2), 326–344 (2012)Google Scholar
 62.D. FernándezFrancos, D. MartínezRego, O. FontenlaRomero, A. AlonsoBetanzos, Automatic bearing fault diagnosis based on oneclass νSVM. Comput. Ind. Eng. 64(1), 357–365 (2013)Google Scholar
 63.G. Wang, Y. He, K. He, Multilayer kernel learning method faced on roller bearing fault diagnosis. J. Softw. 7(7), 1531–1538 (2012)Google Scholar
 64.X.M. Liu, J.W. Yin, Z.L. Feng, J. Dong, Incremental manifold learning via tangent space alignment. in Artificial Neural Networks in Pattern Recognition, (Ulm, Germany, 2006), pp. 107–121Google Scholar
 65.X. Li, A. Zheng, X. Zhang, C. Li, L. Zhang, Rolling element bearing fault detection using support vector machine with improved ant colony optimization. Measurement 46(8), 2726–2734 (2013)Google Scholar
 66.D. Hwang, Y. Youn, J. Sun, K. Choi, J. Lee, Y. Kim, Support vector machine based bearing fault diagnosis for induction motors using vibration signals. J. Electr. Eng. Technol. 10, 30–40 (2015)Google Scholar
 67.R. Liua, B. Yang, X. Zhang, S. Wang, X. Chen, Timefrequency atomsdriven support vector machine method for bearings incipient fault diagnosis. Mech. Syst. Signal Proc. 75, 345–370 (2016)Google Scholar
 68.Y. Li, M. Xu, Y. Wei, W. Huang, A new rolling bearing fault diagnosis method based on multiscale permutation entropy and improved support vector machine based binary tree. Measurement 77, 80–94 (2016)Google Scholar
 69.M.M. Manjurul Islam, J. Kim, S.A. Khan, J. Kima, Reliable bearing fault diagnosis using Bayesian inferencebased multiclass support vector machines. J. Acoust. Soc. Am. 141(2), 7 (2017)Google Scholar
 70.N. Zhang, L. Wu, J. Yang, Y. Guan, Naive bayes bearing fault diagnosis based on enhanced independence of data. Sensors 18(463), 17 (2018)Google Scholar
 71.M. Tabaszewski, Optimization of a nearest neighbors classifier for diagnosis of condition of rolling bearings. Diagnostyka 15(1), 37–42 (2014)Google Scholar
 72.M. Tabaszewski, Fault detection and diagnosis in low speed rolling element bearings Part II The use of nearest neighbour classification. Mech. Syst. Signal Proc. 6(4), 309–316 (1992)Google Scholar
 73.Y. Lei, Z. He, Y. Zi, A combination of WKNN to fault diagnosis of rolling element bearings. J. Vib. Acoust. 131, 6 (2009)Google Scholar
 74.A.B. Andre, E. Beltrame, J. Wainer, A combination of support vector machine and knearest neighbors for machine fault detection. Applied Artificial Intelligence: An Int. J. 27(1), 36–49 (2013)Google Scholar
 75.Q. Wang, Y. Liu, X. He, S. Liu, J. Liu, Fault diagnosis of bearing based on KPCA and KNN method. Advanced Materials Research 986–987, 1491–1496 (2014)Google Scholar
 76.S. Dong, X. Xu, R. Chen, Application of fuzzy Cmeans method and classification model of optimized Knearest neighbor for fault diagnosis of bearing. J. Braz. Soc. Mech. Sci. Eng. 38(8), 2255–2263 (2016)Google Scholar
 77.P. Baraldi, F. Cannarile, F.D. Maio, E. Zio, Hierarchical knearest neighbours classification and binary differential evolution for fault diagnostics of automotive bearings operating under variable conditions. Eng. App. of Artificial Intell. 56, 1–13 (2016)Google Scholar
 78.R.K. Sharma, V. Sugumaran, H. Kumar, M. Amarnath, Condition monitoring of roller bearing by kstar classifier and knearest neighborhood classifier using sound signal. SDHM Structural Durability and Health Monitoring 12(1), 1–16 (2017)Google Scholar
 79.S. Dong, T. Luo, L. Zhong, L. Chen, X. Xu, Fault diagnosis of bearing based on the kernel principal component analysis and optimized knearest neighbour model. J. Low Freq. Noise Vib. Active Control 36(4), 354–365 (2017)Google Scholar
 80.G. Huang, Q. Zhu, C. Siew, Extreme learning machine: theory and applications. Neurocomputing 70, 489–501 (2006)Google Scholar
 81.R. RazaviFar, M. Saif, Ensemble of extreme learning machines for diagnosing bearing defects in nonstationary environments under class imbalance condition. in 2016 IEEE Symposium Series on Computational Intelligence (SSCI) (2016)Google Scholar
 82.G. Ditzler, R. Polikar, N. Chawla, An incremental learning algorithm for nonstationary environments and dass imbalance. in International Conference on Pattern Recognition (ICPR) (2010), pp. 2997–3000Google Scholar
 83.G. Ditzler, R. Polikar, Incremental learning of concept drift from streaming imbalanced data. IEEE Trans. Knowl. Data Eng. 25(10), 2283–2301 (2013)Google Scholar
 84.W. Mao, L. He, Y. Yan, J. Wang, Online sequential prediction of bearings imbalanced fault diagnosis by extreme learning machine. Mech. Syst. Signal Proc. 83, 450–473 (2017)Google Scholar
 85.V. Sugumaran, K.I. Ramachandran, Automatic rule learning using decision tree for fuzzy classifier in fault diagnosis of roller bearing. Mech. Syst. Signal Proc. 21(5), 2237–2247 (2007)Google Scholar
 86.V. Sugumarana, K.I. Ramachandran, Fault diagnosis of roller bearing using fuzzy classifier and histogram features with focus on automatic rule learning. Expert Syst. Appl. 38(5), 4901–4907 (2011)Google Scholar
 87.J. Yu, Local and nonlocal preserving projection for bearing defect classification and performance assessment. IEEE Trans. Ind. Electr. 59(5), 2363–2376 (2012)Google Scholar
 88.P.K. Kankar, S.C. Sharma, S.P. Harsha, Rolling element bearing fault diagnosis using wavelet transform. Neurocomputing 74(5), 1638–1645 (2011)Google Scholar
 89.S. Cao, X. Ma, Y. Zhang, L. Luo, F. Yi, A fault diagnosis method based on semisupervised fuzzy cmeans cluster analysis. Inter. J. on Cyber. & Informatics (IJCI) 4(2), 281–289 (2015)Google Scholar
 90.T. Han, D. Jiang, Rolling bearing fault diagnostic method based on VMDAR model and random forest classifier. Shock Vib 216, 11 (2016). Article ID 5132046 Google Scholar
 91.Y. Mohsenzadeh, H. Sheikhzadeh, A.M. Reza, N. Bathaee, M.M. Kalayeh, The relevance samplefeature machine: a sparse bayesian learning approach to joint featuresample selection. IEEE Trans. Cybern. 43(6), 2241–2254 (2013)Google Scholar
 92.P.K. Wong, J. Zhong, Z. Yang, C.M. Vong, A new framework for intelligent simultaneousfault diagnosis of rotating machinery using pairwisecoupled sparse Bayesian extreme learning committee machine. Arch. Proc. Inst. Mech. Eng. Part C J. Mech. Eng. Sci. 19891996 203–210, 16 (2016)Google Scholar
 93.F. Shen, C. Chen, R. Yan, R.X. Gao, Bearing fault diagnosis based on SVD feature extraction and transfer learning classification. in Prognostics and System Health Management Conference (PHM) (IEEE, 2015), pp. 1–6Google Scholar
 94.S.J. Pan, Q. Yang, A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)Google Scholar
 95.B. Lei, L.Y. Soon, E.L. Tan, Robust SVDbased audio watermarking scheme with differential evolution optimization. IEEE Trans. Audio Speech Lang. Process. 21(1), 2368–2378 (2013)Google Scholar
 96.H.S. Seung, D.L. Daniel, The manifold ways of perception. Science 290, 2268–2269 (2000)Google Scholar
 97.S. Kadoury, M.D. Levine, Face detection in gray scale images using locally linear embeddings. Comput. Vis. Image Underst. 105, 1–20 (2007)Google Scholar
 98.X. Liu, D. Tosun, M.W. Weiner, N. Schuff, Locally linear embedding (LLE) for MRI based Alzheimer’s disease classification. Neuroimage 83, 148–157 (2013)Google Scholar
 99.K. Kima, J. Lee, Sentiment visualization and classification via semisupervised nonlinear dimensionality reduction. Pattern Recognit. 47, 758–768 (2014)Google Scholar
 100.J.H. Yang, J.W. Xu, D.B. Yang, Noise reduction method for nonlinear time series based on principal manifold learning and its application to fault diagnosis. Chin. J. Mech. Eng. 42, 154–158 (2006)Google Scholar
 101.X. Wang, Y. Zheng, Z. Zhao, J. Wang, Bearing fault diagnosis based on statistical locally linear embedding. Sensors 15, 16225–16247 (2015)Google Scholar
 102.S.T. Roweis, L.K. Saul, Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)Google Scholar
 103.Y. Wang, G. Xu, L. Liang, K. Jiang, Detection of weak transient signals based on wavelet packet transform and manifold learning for rolling element bearing fault diagnosis. Mech. Syst. Signal Process. 54–55, 259–276 (2015)Google Scholar
 104.A. Hertzmann, Introduction to Bayesian Learning, Course Notes (University of Toronto, Ontario, 2004)Google Scholar
 105.M.R.G. Meireles, P.E.M. Almeida, M.G. Simões, A comprehensive review for industrial applicability of artificial neural networks. IEEE Trans. Ind. Electr. 50(3), 585–601 (2003)Google Scholar
 106.N. Qian, On the momentum term in gradient descent learning algorithms. Neural Netw. 12(1), 145–151 (1999)MathSciNetGoogle Scholar
 107.K.F. AlRaheem, W. AbdulKarem, Rolling bearing fault diagnostics using artificial neural networks based on Laplace wavelet analysis. Int. J. Eng. Sci. Technol. 2(6), 278–290 (2010)Google Scholar
 108.M. Nielsen, Chapter 6, Neural Networks and Deep Learning (2015)Google Scholar
 109.A.T. Vemuri, M.M. Polycarpou, Neuralnetworkbased robust fault diagnosis in robotic systems. IEEE Trans. Neural Netw. 8(6), 1410–1420 (1997)Google Scholar
 110.V.N. Ghate, S.V. Dudul, Cascade neuralnetworkbased fault classifier for threephase induction motor. IEEE Trans. Ind. Electr. 58(5), 1555–1563 (2011)Google Scholar
 111.S.S. Moosavi, A. Djerdir, Y. AitAmirat, D.A. Khaburi, A. N’Diaye, Artificial neural networkbased fault diagnosis in the AC–DC converter of the power supply of series hybrid electric vehicle. IET Electr. Syst. Transp. 6(2), 96–106 (2016)Google Scholar
 112.B. Li, M. Chow, Y. Tipsuwan, J.C. Hung, Neuralnetworkbased motor rolling bearing fault diagnosis. IEEE Trans. Ind. Electr. 47(5), 1060–1069 (2000)Google Scholar
 113.B. Samanta, K.R. AlBalushi, Artificial neural network based fault diagnostics of rolling element bearings using timedomain features. Mech. Sys. and Sig. Proc. 17(2), 317–328 (2003)Google Scholar
 114.D.H. Pandya, S.H. Upadhyay, S.P. Harsha, “ANN based fault diagnosis of rolling element bearing using timefrequency domain feature,” Int. J. Eng. Science and Technology (IJEST) 4(06), 2878–2886 (2012)Google Scholar
 115.B. Samanta, K.R. AlBalushi, S.A. AlAraimi, Bearing fault detection using artificial neural networks and genetic algorithm. J. on Applied Sig. Processing 2004(3), 366–377 (2004)Google Scholar
 116.H. Yang, J. Mathew, L. Ma, V. Kosse, Matching pursuit feature based neural network pattern recognition of ball bearing faults. in International Conference of Maintenance Societies (Australia, 2004), pp. 25–28Google Scholar
 117.N. Gebraeel, M. Lawley, R. Liu, V. Parmeshwaran, Residual life predictions from vibrationbased degradation signals: a neural network approach. IEEE Trans. Ind. Electr. 51(3), 694–700 (2004)Google Scholar
 118.V. Hariharan, P.S.S. Srinivasan, New approach of classification of rolling element bearing fault using artificial neural network. J. Mech. Eng. 40(2), 119–130 (2009)Google Scholar
 119.M. Delgado, G. Cirrincione, A.G. Espinosa, J.A. Ortega, H. Henao, Bearing faults detection by a novel condition monitoring scheme based on statisticaltime features and neural networks. IEEE Trans. Ind. Electr. 60(8), 3398–3407 (2013)Google Scholar
 120.M. Unal, M. DEmetgul, M. Onat, H. Kucuk, Fault diagnosis of rolling bearing based on feature extration and neural network algorithm. Recent Adv. Telecom Signal Syst 179–185 (2013)Google Scholar
 121.S.S. Refaat, H. AbuRub, M.S. Saad, E.M. AboulZahab, A. Iqbal, ANNbased for detection, diagnosis the bearing fault for three phase induction motors using current signal. in 2013 IEEE International Conference on Industrial Technology (ICIT), (2013), pp. 253–258Google Scholar
 122.J.P. Patela, S.H. Upadhyayb, Comparison between artificial neural network and support vector method for a fault diagnostics in rolling element bearings. Proc. Eng. 12th Int. Conf. Vib. Probl. ICOVP2015 144, 390–397 (2016)Google Scholar
 123.D.K. Gaud, P. Jayaswal, Effects of artificial neural network parameters on rolling element bearing fault diagnosis. Int. J. Curr Eng. Sci. Res. 3(1), 55–60 (2016)Google Scholar
 124.N. Zhao, H. Zheng, L. Yang, Z. Wang, A fault diagnosis approach for rolling element bearing based on Stransform and artificial neural network. in Proceedings of ASME Turbo Expo 2017: Turbomachinery Technical Conference and Exposition GT2017, USA, (2017)Google Scholar
 125.R.G. Stockwell, L. Mansinha, R.P. Lowe, Localization of the complex spectrum: the Stransform. IEEE Trans. Signal Process. 44(4), 998–1001 (1996)Google Scholar
 126.J.B. Ali, L. Saidi, A. Mouelhi, B. ChebelMorello, F. Fnaiech, Linear feature selection and classification using PNN and SFAM neural networks for an early online diagnosis of bearing naturally progressing degradations. Eng. Appl. Artif. Intell. 42, 67–81 (2015)Google Scholar
 127.A.A. Jaber, R. Bicker, Fault diagnosis of industrial robot bearings based on discrete wavelet transform and artificial neural network. Int. J. Progn. Health Manag. 017, 13 (2016). ISSN 21532648 Google Scholar
 128.J. Zheng, H. Pan, J. Cheng, Rolling bearing fault detection and diagnosis based on composite multiscale fuzzy entropy and ensemble support vector machines. Mech. Syst. Signal Process. 85, 746–759 (2017)Google Scholar
 129.D. Yao, J. Yang, Y. Bai, X. Cheng, Railway rolling bearing fault diagnosis based on multiscale intrinsic mode function permutation entropy and extreme learning machine classifier. Adv. Mech. Eng. 8(10), 1–9 (2016)Google Scholar
 130.Q. Tong, J. Cao, B. Han, X. Zhang, Z. Nie, J. Wang, Y. Lin, W. Zhang, A fault diagnosis approach for rolling element bearings based on RSGWPTLCD bilayer screening and extreme learning machine. IEEE Access 5, 5515–5530 (2017)Google Scholar
 131.M. Liang, D. Su, D. Hu, M. Ge, A novel faults diagnosis method for rolling element bearings based on ELCD and extreme learning machine. Shock Vib. 218, 10 (2018). Article ID 1891453 Google Scholar
 132.L.B. Jack, A.K. Nandi, Fault detection using support vector machines and artificial neural networks, augmented by genetic algorithms. Mech. Syst. Signal Process. 16(2–3), 373–390 (2002)Google Scholar
 133.P. Jayaswal, S.N. Verma, A.K. Wadhwani, Development of EBPArtificial neural network expert system for rolling element bearing fault diagnosis. J. Vib. Control 17(8), 1131–1148 (2011)Google Scholar
 134.H.M. Ertunc, H. Ocak, C. Aliustaoglu, ANN and ANFISbased multistaged decision algorithm for the detection and diagnosis of bearing faults. Neural Comput. Appl. 22(1), S435–S446 (2013)Google Scholar
 135.B.A. Paya, I.I. Esat, Artificial neural network based fault diagnostics of rotating machinery using wavelet transforms as a preprocessor. Mech. Syst. Signal Process. 11(5), 751–765 (1997)Google Scholar
 136.Y. Yu, Y. Dejie, C. Junsheng, A roller bearing fault diagnosis method based on EMD energy entropy and ANN. J. Sound Vib. 294, 269–277 (2006)Google Scholar
 137.K.F. AlRaheem, A. Roy, K.P. Ramachandran, D.K. Harrison, S. Grainger, Application of the laplacewavelet combined with ANN for rolling bearing fault diagnosis. J. Vib. Acoust. 130, 9 (2008)Google Scholar
 138.Y. Hwang, K. Jen, Y. Shen, Application of cepstrum and neural network to bearing fault detection. J. Mech. Sci. Technol. 23, 2730–2737 (2009)Google Scholar
 139.K. AlRaheem, Wavelet analysis and neural networks for bearing fault diagnosis. Advances in Wavelet Theory and Their Applications in Eng., Physics and Technology, (2012), pp. 313–352Google Scholar
 140.J.B. Ali, B. ChebelMorello, L. Saidi, S. Malinowski, F. Fnaiech, Accurate bearing remaining useful life prediction based on Weibull distribution and artificial neural network. Mech. Syst. Signal. Process. 56–57, 150–172 (2015)Google Scholar
 141.J.B. Ali, N. Fnaiech, L. Saidi, B. ChebelMorello, F. Fnaiech, Application of empirical mode decomposition and artificial neural network for automatic bearing fault diagnosis based on vibration signals. Appl. Acoust. 89, 16–27 (2015)Google Scholar
 142.R. Dubey, D. Agrawal, Bearing fault classification using ANNbased Hilbert footprint analysis. IET Sci. Meas. Technol. 9(8), 1016–1022 (2015)Google Scholar
 143.Q. Hu, Z. He, Z. Zhang, Y. Zi, Fault diagnosis of rotating machinery based on improved wavelet package transform and SVMs ensemble. Mech. Syst. Signal Proc ess. 21, 688–705 (2007)Google Scholar
 144.J. Yang, Y. Zhang, Y. Zhu, Intelligent fault diagnosis of rolling element bearing based on SVMs and fractal dimension. Mech. Syst. Signal Process. 21, 2012–2024 (2007)Google Scholar
 145.L. Guo, J. Chen, X. Li, Rolling bearing fault classification based on envelope spectrum and support vector machine. J. Vib. Control 15(9), 1349–1363 (2009)zbMATHGoogle Scholar
 146.P. Konar, P. Chattopadhyay, Bearing fault detection of induction motor using wavelet and support vector machines (SVMs). Appl. Soft Comput. 11, 4203–4211 (2011)Google Scholar
 147.S. Wu, P. Wu, C. Wu, J. Ding, C. Wang, Bearing fault diagnosis based on multiscale permutation entropy and support vector machine. Entropy 14, 1343–1356 (2012)zbMATHGoogle Scholar
 148.Z. Liu, H. Cao, X. Chen, Z. He, Z. Shen, Multifault classification based on wavelet SVM with PSO algorithm to analyze vibration signals from rolling element bearings. Neurocomputing 99, 399–410 (2013)Google Scholar
 149.X. Zhang, Y. Liang, J. Zhou, Y. Zang, A novel bearing fault diagnosis model integrated permutation entropy, ensemble empirical mode decomposition and optimized SVM. Measurement 69, 164–179 (2015)Google Scholar
 150.L. Saidi, J.B. Ali, F. Fnaiech, Application of higher order spectral features and support vector machines for bearing faults classification. ISA Trans. 54, 193–206 (2015)Google Scholar
 151.Y. Li, M. Xu, H. Zhao, W. Huang, Hierarchical fuzzy entropy and improved support vector machine based binary tree approach for rolling bearing fault diagnosis. Mech. Mach. Theory 98, 114–132 (2016)Google Scholar
 152.J. Tian, C. Morillo, M.H. Azarian, M. Pecht, Motor bearing fault detection using spectral kurtosisbased feature extraction coupled with knearest neighbor distance analysis. IEEE Trans. Ind. Electr. 63, 3 (2016)Google Scholar
 153.Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521(7553), 436–444 (2015)Google Scholar
 154.I. Goodfellow, Y. Bengio, A. Courville, Deep learning. MIT Press, ww.deeplearningbook.org (2016)Google Scholar
 155.W. Yan, L. Yu, On accurate and reliable anomaly detection for gas turbine combustors: A deep learning approach. in Annual Conference of The Prognostics and Health Management Society 2015, vol. 6, 2015Google Scholar
 156.H. Dong, L. Yang, H. Li, Small fault diagnosis of frontend speed controlled wind generator based on deep learning. WSEAS Trans. Circ. Syst. 15, 64–72 (2016)Google Scholar
 157.F. Lv, C. Wen, Z. Bao, M. Liu, Fault diagnosis based on deep learning. 2016 American Control Conference (ACC). Boston Marriott Copley Place, Boston, MA, USA, July 6–8, (2016)Google Scholar
 158.H. Liu, C. Liu, Y. Huang, Adaptive feature extraction using sparse coding for machinery fault diagnosis. Mech. Syst. Signal Process. 25(2), 558–574 (2011)MathSciNetGoogle Scholar
 159.N.K. Verma, V.K. Gupta, M. Sharma, R.K. Sevakula, Intelligent condition based monitoring of rotating machines using sparse autoencoders. in Proceedings of IEEE Conference on Prognostics and Health Management, (Gaithersburg, 2013) pp. 1–7, June 24–27Google Scholar
 160.S. Min, B. Lee, S. Yoo, Deep learning in bioinformatics. Briefings Bioinf 18, 851–869 (2017)Google Scholar
 161.D. Lee, V. Siu, R. Cruz, C. Yetman, Convolutional neural net and bearing fault analysis. in Proceedings of the International Conference on Data Mining (DMIN’16), (2016), pp. 194–200Google Scholar
 162.X. Guo, L. Chen, C. Shen, Hierarchical adaptive deep convolution neural network and its application to bearing fault diagnosis. Measurement 93, 490–502 (2016)Google Scholar
 163.X. Ding, Q. He, Energyfluctuated multiscale feature learning with deep convnet for intelligent spindle bearing fault diagnosis. IEEE Trans. Inst. Meas 66(8), 1926–1935 (2017)Google Scholar
 164.O. Janssens, V. Slavkovikj, B. Vervisch, K. Stockman, M. Loccufier, S. Verstockt, R. Van de Walle, S. Van Hoecke, Convolutional neural network based fault detection for rotating machinery. J. Sound Vib. 377, 331–345 (2016)Google Scholar
 165.W. You, C. Shen, X. Guo, Z. Zhu, Bearing fault diagnosis using convolution neural network and support vector regression. in 2017 International Conference on Mech. Engineering and Cont. Automation, (2017), pp. 6–11Google Scholar
 166.W. You, C. Shen, X. Guo, X. Jiang, J. Shi, Z. Zhu, A hybrid technique based on convolutional neural network and support vector regression for intelligent diagnosis of rotating machinery. Adv. Mech. Eng. 9(6), 1–17 (2017)Google Scholar
 167.W. Zhang, G. Peng, C. Li, Y. Chen, Z. Zhang, A new deep learning model for fault diagnosis with good antinoise and domain adaptation ability on raw vibration signals. Sensors 17(425), 1–21 (2017)Google Scholar
 168.W. Fuan, J. Hongkai, S. Haidong, D. Wenjing, W. Shuaipeng, An adaptive deep convolutional neural network for rolling bearing fault diagnosis. Meas. Sci. Technol. 28(9), 1–25 (2017)Google Scholar
 169.S. Li, G. Liu, X. Tang, J. Lu, J. Hu, An ensemble deep convolutional neural network model with improved DS evidence fusion for bearing fault diagnosis. Sensors 17(1729), 1–19 (2017)Google Scholar
 170.Y. Xie, T. Zhang, Fault diagnosis for rotating machinery based on convolutional neural network and empirical mode decomposition. Shock Vib. 2017, 12 (2017)Google Scholar
 171.C. Lu, Z. Wang, B. Zhou, Intelligent fault diagnosis of rolling bearing using hierarchical convolutional network based health state classification. Adv. Eng. Inf. 32, 139–151 (2017)Google Scholar
 172.R. Socher, C.C. Lin, A.Y. Ng, C.D. Manning, Parsing natural scenes and natural language with recursive neural networks. The 28th International Conference on Machine Learning (ICML 2011), (2011)Google Scholar
 173.K. Cho, D. Bahdanau, F. Bougares, H. Schwenk, Y. Bengio, Learning phrase representations using RNN encoder–decoder for statistical machine translation. in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), (Doha, Qatar, October 2529, 2014), pp. 1724–1734Google Scholar
 174.R. Dey, F.M. Salem, Gatevariants of gated recurrent unit (GRU) neural networks. in IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), 2017 (2017), pp. 5Google Scholar
 175.W. Abed, S. Sharma, R. Sutton, A. Motwani, A robust bearing fault detection and diagnosis technique for brushless DC motors under nonstationary operating conditions. J. Control Autom. Electr. Syst. 26, 14 (2015)Google Scholar
 176.A. Malhi, R. Yan, R.X. Gao, Prognosis of defect propagation based on recurrent neural networks. in IEEE Transaction on Instrumentation and Measurement, (vol. 60, no. 3, March 2011)Google Scholar
 177.S. Sharma, W. Abed, R. Sutton, B. Subudhi, Corrosion fault diagnosis of rolling element bearing under constant and variable load and speed conditions. IFACPapersOnLine 48–30, 049–054 (2015)Google Scholar
 178.Y. Xie, T. Zhang, The application of echo state network and recurrent multilayer perceptron in rotating machinery fault prognosis. in Proceedings of 2016 IEEE Chinese Guidance, Navigation and Control Conference, (China, 2016), pp. 2286–2291Google Scholar
 179.L. Guo, N. Li, F. Jia, Y. Lei, J. Lin, A recurrent neural network based health indicator for remaining useful life prediction of bearings. Neurocomputing 240, 98–109 (2017)Google Scholar
 180.Q. Cui, Z. Li, J. Yang, B. Liang, Rolling bearing fault prognosis using recurrent neural network. in 29th Chinese Control And Decision Conference (CCDC), (2017), pp. 1196–1201Google Scholar
 181.G.E. Hinton, S. Osindero, Y. Teh, A fast learning algorithm for deep belief nets. Neural Comput. 18, 16 (2006)MathSciNetzbMATHGoogle Scholar
 182.G. Hinton, Deep belief nets. Encycl. Mach. Learn. 4, 5947 (2010)Google Scholar
 183.R. Salakhutdinov, G. Hinton, Deep boltzmann machines. in Proceedings of the 12th International Conference on Artificial Intelligence and Statistics 2009, (Florida, USA. vol. 5 of JMLR: W&CP 2009), p. 5Google Scholar
 184.T. Jie, L. YiLun, Y. DaLian, T. Fang, L. Chi, Fault diagnosis of rolling bearing using deep belief networks. in International Symposium on Material, Energy and Environment Engineering, (2015), pp. 566–569Google Scholar
 185.H. Shao, H. Jiang, X. Zhang, M. Niu, Rolling bearing fault diagnosis using an optimization deep belief network. Meas. Sci. Technol. 26(115002), 17 (2015)Google Scholar
 186.X. Wang, Y. Li, T. Rui, H. Zhu, J. Fei, Bearing fault diagnosis method based on Hilbert envelope spectrum and deep belief network. J. Vibroeng. 17(3), 1295–1308 (2015)Google Scholar
 187.R. Zhang, L. Wu, X. Fu, B. Yao, Classification of bearing data based on deep belief networks. in Prognostics and System Health Management Conference (PHMChengdu), (2016), pp. 1–6Google Scholar
 188.M. Ma, X. Chen, S. Wang, Y. Liu, W. Li, Bearing degradation assessment based on weibull distribution and deep belief network. in 2016 Internatinal Symposium on Flexible Automat., (Ohio, U.S.A., 2016), pp. 1–4Google Scholar
 189.Y. Liu, D. Yang, Bearing fault diagnosis based on deep belief network and multisensor information fusion. Shock Vib. 216, 9 (2016). (Article ID 9306205) Google Scholar
 190.A. Yin, J. Lu, Z. Dai, J. Li, Q. Ouyang, Isomap and deep belief networkbased machine health combined assessment model. Strojniški vestnik J. Mech. Eng. 62(12), 740–750 (2016)Google Scholar
 191.M. Gan, C. Wang, C. Zhu, Construction of hierarchical diagnosis network based on deep learning and its application in the fault pattern recognition of rolling element bearings. Mech. Syst. Signal Process. 72–73, 92–104 (2016)Google Scholar
 192.J. Deutsch, M. He, D. He, Remaining useful life prediction of hybrid ceramic bearings using an integrated deep learning and particle filter approach. Appl. Sci. 7(649), 17 (2017)Google Scholar
 193.S. Devendiran, K. Manivannan, S.C. Kamani, R. Refai, An early bearing fault diagnosis using effective feature selection methods and data mining techniques. Int. J. Eng. Technol. (IJET) 7(2), 583–598 (2015)Google Scholar
 194.R. Zhang, Z. Peng, L. Wu, B. Yao, Y. Guan, Fault diagnosis from raw sensor data using deep neural networks considering temporal coherence. Sensors 17(549), 17 (2017)Google Scholar
 195.J. Deutsch, D. He, Using deep learningbased approach to predict remaining useful life of rotating components. IEEE Trans. Syst. Man Cybern. Syst. 48(1), 11–20 (2018)Google Scholar
 196.H. Shao, H. Jiang, H. Zhang, T. Liang, Electric locomotive bearing fault diagnosis using a novel convolutional deep belief network. IEEE Trans. Ind. Electr. 65(3), 2727–2736 (2018)Google Scholar
 197.H. Oh, J.H. Jung, B.C. Jeon, B.D. Youn, Scalable and unsupervised feature engineering using vibrationimaging and deep learning for rotor system diagnosis. IEEE Trans. Ind. Electr. 65(4), 3539–3549 (2018)Google Scholar
 198.S. Deng, Z. Cheng, C. Li, X. Yao, Z. Chen, R.V. Sanchez, Rolling bearing fault diagnosis based on deep boltzmann machines. in 2016 Prognostics and System Health Management Conference (PHMChengdu), (2016), pp. 19–21Google Scholar
 199.L. Liao, W. Jin, R. Pavel, Enhanced restricted boltzmann machine with prognosability regularization for prognostics and health assessment. IEEE Trans. Ind. Electr. 63(11), 7076–7083 (2016)Google Scholar
 200.X. He, D. Wang, Y. Li, C. Zhou, A novel bearing fault diagnosis method based on gaussian restricted boltzmann machine. Math. Probl. Eng. 216, 8 (2016). Article ID 2957083 Google Scholar
 201.K.H. Cho, A. Ilin, T. Raiko, Improved learning of gaussianbernoulli restricted boltzmann machines. in Artificial Neural Networks and Machine Learning—ICANN 2011, (Springer Berlin Heidelberg: Berlin, Germany, vol. 6791), pp. 10–17Google Scholar
 202.C. Li, R. Sánchez, G. Zurita, M. Cerrada, D. Cabrera, Fault diagnosis for rotating machinery using vibration measurement deep statistical feature learning. Sensors 16(895), 19 (2016)Google Scholar
 203.J. Deutsch, D. He, Using deep learning based approaches for bearing remaining useful life prediction. in Annual Conference of the Prognostics and Health Management Society, (2016)Google Scholar
 204.G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)MathSciNetzbMATHGoogle Scholar
 205.W. Lu, X. Wang, C. Yang, T. Zhang, A novel feature extraction method using deep neural network for rolling bearing fault diagnosis. in The 27th Chinese Control and Decision Conference (2015 CCDC). (IEEE, 2015) pp. 2427–2431Google Scholar
 206.F. Jia, Y. Lei, J. Lin, X. Zhou, N. Lu, Deep neural networks: a promising tool for fault characteristic mining and intelligent diagnosis of rotating machinery with massive data. Mech. Syst. Signal Process. 72, 303–315 (2016)Google Scholar
 207.T. Junbo, L. Weining, A. Juneng, W. Xueqian, Fault diagnosis method study in roller bearing based on wavelet transform and stacked autoencoder. in The 27th Chinese Control and Decision Conference (2015 CCDC) (IEEE, 2015), pp. 4608–4613Google Scholar
 208.W. Zhao, C. Lu, J. Ma, Z. Wang, A deep learning method using SDA combined with dropout for bearing fault diagnosis. Vibroeng. Proc. 5(151), 156 (2015)Google Scholar
 209.H. O. A. Ahmed, M. L. Dennis Wong, and A. K. Nandi, Effects of deep neural network parameters on classification of bearing faults. in IECON 201642nd Annual Conference of the IEEE Industrial Electronics Society, (2016), pp. 6329–6334Google Scholar
 210.L. Guo, H. Gao, H. Huang, X. He, S. Li, Multifeatures fusion and nonlinear dimension reduction for intelligent bearing condition monitoring. Shock Vib 216, 10 (2016). (Article ID 4632562) Google Scholar
 211.S. Tao, T. Zhang, J. Yang, X. Wang, W. Lu, Bearing fault diagnosis method based on stacked autoencoder and softmax regression. in Control Conference (CCC), 2015 34th Chinese. (IEEE, 2015), pp. 6331–6335Google Scholar
 212.H. Liu, L. Li, J. Ma, Rolling bearing fault diagnosis based on STFTdeep learning and sound signals. Shock Vib. 2016, 12 (2016). (Article ID 6127479) Google Scholar
 213.W. Mao, J. He, Y. Li, Y. Yan, Bearing fault diagnosis with autoencoder extreme learning machine: a comparative study. Proc. Mech. E Part C J. Mech. Eng. Sci. 231, 1560–1578 (2016)Google Scholar
 214.R. Thirukovalluru, S. Dixit, R. K. Sevakula, N. K. Verma, and A. Salour, Generating feature sets for fault diagnosis using denoising stacked autoencoder. in 2016 IEEE International Conference on Prognostics and Health Management (ICPHM) (IEEE, 2016), pp. 1–7, 2016Google Scholar
 215.X. Guo, C. Shen, L. Chen, Deep fault recognizer: an integrated model to denoise and extract features for fault diagnosis in rotating machinery. Appl. Sci. 7(41), 1–17 (2017)Google Scholar
 216.Z. Chen, W. Li, Multisensor feature fusion for bearing fault diagnosis using sparse auto encoder and deep belief network. IEEE Trans. Instr. Meas. 66(7), 1693–1702 (2017)Google Scholar
 217.C. Lu, Z. Wang, W. Qin, J. Ma, Fault diagnosis of rotary machinery components using a stacked denoising autoencoderbased health state identification. Signal Process. 130, 377–388 (2017)Google Scholar
 218.R. M. Hasani, G. Wang, R. Grosu, An automated autoencoder correlationbased healthmonitoring and prognostic method for machine bearings. arXiv:1703.06272v1 [cs.LG], 2017
 219.M. Sohaib, C. Kim, J. Kim, A hybrid feature model and deeplearningbased bearing fault diagnosis. Sensors 17(2876), 1–16 (2017)Google Scholar
 220.H. Shao, H. Jiang, F. Wang, H. Zhao, An enhancement deep feature fusion method for rotating machinery fault diagnosis. Knowl Based Syst 119, 200–220 (2017)Google Scholar
 221.A. Shaheryar, X. Yin, W.Y. Ramay, Deeplearning framework: an application for fault identification in rotary machines. Int. J. Comput. Appl. (0975–8887) 167(4), 37–45 (2017)Google Scholar
 222.J. Sun, C. Yan, J. Wen, Intelligent bearing fault diagnosis method combining compressed data acquisition and deep learning. IEEE Trans. Instr. Meas. 67(1), 185–195 (2018)Google Scholar
 223.H.O.A. Ahmed, M.L.D. Wong, A.K. Nandi, Intelligent condition monitoring method for bearing faults from highly compressed measurements using sparse overcomplete features. Mech. Syst. Signal Process. 99, 459–477 (2018)Google Scholar
 224.D. Ravı, C. Wong, F. Deligianni, M. Berthelot, J. AndreuPerez, B. Lo, G. Yang, Deep learning for health informatics. IEEE J. Biomed. Health Inform. 21(1), 4–21 (2017)Google Scholar
 225.D.L. Donoho, Compressed sensing. IEEE Trans. Inf. Theory 52, 1289–1306 (2006)MathSciNetzbMATHGoogle Scholar