Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Traditional signal representation methods often express a signal as a linear combination of orthogonal atoms which compose a complete dictionary, thus a large number of representation coefficients are required to recover the signal because of the characteristics of the atoms. Instead, by a specialized over-complete dictionary, the signal can be represented by an optimal linear combination of atoms, leading to the sparsity of representation coefficients. Sparse representation has been proven to be one of the powerful tools in signal processing, image processing, computer vision and pattern recognition [1,2,3]. Recently, much work has been done to introduce sparse representation theory to fault feature extraction from mechanical vibration signals [4,5,6].

The main purpose of fault diagnosis is to ensure the availability, reliability and operational safety of the equipment. Fault feature extraction which allows one to distinguish the faulty condition from the normal condition is one of the important tasks in fault diagnosis. When there’s a defect occurring on the rotating elements (e.g. gear or bearings), it will interact with another element and then produce a series of impulses in the vibration signal. Under constant speed operating condition, the vibration responses compose of periodic impulses. These transients display similarly in terms of waveform morphology in the time domain, thus have sparse features. Due to such transient and sparse properties, the fault feature extraction task can be transferred to the task of fault feature sparse representation.

Signal sparse representation consists of two main aspects, i.e., dictionary construction and optimization solution. With a suitably constructed dictionary, few atoms in the dictionary can be merged to represent the fault features effectively, while ineffective in representing the noise. Thus, a sparse representation of the fault features can be obtained, and the background noise can be removed at the same time. Moreover, with an efficient algorithm, the representation coefficients can be easily obtained.

This chapter will present an overview of the sparse representation theory. Moreover it will introduce how to deal with the two main aspects of sparse representation for the mechanical vibration signal processing. The application of sparse representation in mechanical fault feature detection will also be explored in this chapter, such as fault detection of rolling bearings, gearboxes and compound bearing faults.

2 Sparse Representation Theory

2.1 Sparse Representation Model

Consider a matrix \(A \in R^{N \times M}\) with \(N < M\) whose columns are the atoms \(\left\{ {\varvec{a}_{i} } \right\}_{i = 1}^{M}\), there are \(N\) linear independent vectors in this matrix. The matrix \(\varvec{A}\) spans an \(N\)-dimension Hilbert space. Suppose the measured fault vibration signal can be written as

$$y(t) = x(t) + n(t)$$
(1)

where \(y(t)\) is the measured vibration signal, \(x(t)\) is the fault-induced signal component without noise and \(n(t)\) is the noise. Equation (1) can also be written as

$$\varvec{y} = \varvec{x} + \varvec{n}$$
(2)

\(\varvec{x}\) can be represented with an over-complete matrix \(\varvec{A}\) as

$$\varvec{x} = \sum\limits_{i = 1}^{M} {c_{i} \varvec{a}_{i} }$$
(3)

or more compactly \(\varvec{x} = \varvec{Ac}\), where \(\varvec{c}\) is an \(M \times 1\) column vector of representation coefficients. If c is not in the span of the columns of A, Eq. (3) has no solution; otherwise, this equation has infinite number of solutions, with the general solution having \(l\) free parameters, where \(l\) is the difference between the number of variables and the rank.

Among the general solutions, some may perform better than others. In order to narrow this choice to a well-defined solution, additional criteria are needed [7]. Traditional way to achieve this is to employ the regularization \(J(\varvec{c})\). Define the general optimization problem:

$$\mathop {\hbox{min} }\limits_{\varvec{c}} J(\varvec{c})\quad {\text{s}}.{\text{t}}.\;\;\varvec{Ac} = \varvec{x}$$
(4)

There are many possible choices for the objective function \(J(\varvec{c})\), from which the well-known choice is the \(\left\| \cdot \right\|_{p}\), which denotes the \(l_{p}\)-norm

$$\left\| \varvec{c} \right\|_{p} = \left( {\sum\nolimits_{i} {\left| {c_{i} } \right|^{p} } } \right)^{{\frac{1}{p}}}$$
(5)

Let \(p \to 0\) of the \(l_{p}\)-norm, the \(l_{0}\)-norm can be denoted as

$$\left\| \varvec{c} \right\|_{0} = \mathop {\lim }\limits_{p \to 0} \left( {\sum\limits_{i} {\left| {c_{i} } \right|^{p} } } \right)^{{\frac{1}{p}}}$$
(6)

In reality, engineers also use the definition of \(l_{0}\)-norm below instead

$$\left\| \varvec{c} \right\|_{0} = \# \left\{ {i{:}c_{i} \ne 0} \right\}$$
(7)

which represents the total number of non-zero elements in a vector. In such underdetermined linear systems of Eq. (3), the aim is to find a sparsest coefficient vector \(\varvec{c}\) to “explain” the signal \(\varvec{x}\). The sparsest solution means the solution which has the fewest non-zero elements, i.e. the lowest \(l_{0}\)-norm, thus leading to the following equation

$$\mathop {\hbox{min} }\limits_{\varvec{c}} \left\| \varvec{c} \right\|_{0} \quad {\text{s}}.{\text{t}}.\;\;\varvec{Ac} = \varvec{x}$$
(8)

Considering the noise component in the measured signal \(\varvec{y}\), Eq. (8) can be written as

$$\mathop {\hbox{min} }\limits_{\varvec{c}} \left\| \varvec{c} \right\|_{0} \quad {\text{s}}.{\text{t}}.\quad \left\| {\varvec{Ac} - \varvec{y}} \right\|_{2}^{2} \;\; \le \varepsilon$$
(9)

Equation (9) is the sparse representation model.

After the construction of the representation model, two main aspects should be taken into consideration: (a) how to construct a suitable dictionary \(\varvec{A}\) to ensure the sparsity of the coefficient vector \(\varvec{c}\); (b) how to solve the model to obtain the sparse representation vector. These two problems will be elaborated in the following.

2.2 Construction of the Over-Complete Dictionary

Signal representation can be considered as a way to observe and learn the features of the signal from different aspects. Signal processing techniques commonly require more meaningful representations which can capture the useful characteristics of the signal [8]. The measured signal can be sparsely represented over a dictionary, which means few atoms in the dictionary can be merged to form the signal, indicating that only few coefficients are involved to seize the concerned information. Therefore, it is vital to develop a well-constructed dictionary so that a sparse representation of the signal features can be led. To achieve a proper dictionary, an over-complete dictionary has been established and commonly used.

  1. (a)

    Gabor dictionary

Gabor atoms are proposed by Dennis Gabor in which a family of functions is built from translations and modulations of a generating function. The Gabor atom is defined as

$$g_{\gamma } (t) = \frac{1}{\sqrt s }g(\frac{t - u}{s}){\text{e}}^{j\xi t}$$
(10)

where \(g(t) = {\text{e}}^{{ - \pi t^{2} }}\) is the Gaussian window function, \(\gamma = (s,u,\xi )\) are the parameters of the atom, \(s\) is the scaling factor, \(u\) is the translation factor, \(\xi\) is the frequency factor. The function \(g_{\gamma } (t)\) is centered at \(u\) and its energy is mostly concentrated in a neighborhood of \(u\) whose size is proportional to \(s\).

Due to the good time-frequency resolution, the Gabor dictionary has been commonly used to analyze the EEG signal [9], audio signal [10] and so on. However, since the atoms of Gabor dictionary are frequency-fixed and divide the time-frequency plane by rectangular grid, it is weak in analyzing signals with frequency-converted components.

  1. (b)

    Chirplet dictionary

A Chirplet atom, which is built from the unit Gaussian window by dilation, translation, frequency and chirp modulation, is defined as

$$g_{\gamma } (t)=\frac{1}{\sqrt \sigma }g\left( {\frac{t - u}{\sigma }} \right)\exp \left\{ {j2\pi \left[ {1 + r \cdot \left( {\frac{t - u}{\sigma }} \right)} \right] \cdot f_{c} \cdot \left( {\frac{t - u}{\sigma }} \right)} \right\}$$
(11)

Equation (11) can also be written as

$$g_{\gamma } (t) = \frac{1}{\sqrt \sigma }g\left( {\frac{t - u}{\sigma }} \right)\exp \left\{ {j2\pi \left[ {f_{c} \cdot \left( {\frac{t - u}{\sigma }} \right) + \frac{1}{2}\xi \cdot \left( {\frac{t - u}{\sigma }} \right)^{2} } \right]} \right\}$$
(12)

where \(\xi = 2rf_{c}\) is the linear Chirp rate, \(\gamma = \left( {\sigma ,u,\xi ,c} \right)\) is the atom parameters set, \(\sigma\) is the scale operator which controls the width of the function, \(u\) is the time center of the Chirplet function, \(\xi\) is the frequency-variant factor, \(f_{c}\) is the frequency center of the Chirplet function.

Different parameter values denote different Chirplet functions, which compose the Chirplet dictionary. As its frequency can change linearly, the Chirplet dictionary are more likely to be used to describe the signal whose frequency linearly varies with the time, such as the radar signal and the sonar signal [11]. However, when dealing with the time varying components, Chirplet becomes less effective and even inaccurate.

  1. (c)

    FM m let dictionary

To characterize both the signal’s time-invariant and time-varying spectral contents, the dilated and translated windowed exponential frequency modulated functions (FMmlet) is proposed by Zou et al. [12]. The FMmlet atom is defined as

$$g_{\gamma } (t) = \frac{1}{\sqrt \sigma }g\left( {\frac{t - u}{\sigma }} \right)\exp \left\{ {j2\pi \left[ {1 + r\left( {\frac{t - u}{\sigma }} \right)} \right]^{m} f_{c} \left( {\frac{t - u}{\sigma }} \right)} \right\}$$
(13)

The atom is expressed by the following five parameters: scaling operator \(\sigma\), time-center \(u\), frequency center \(f_{c}\), chirp rate \(r\), and FM exponent \(m\).

The FMmlet dictionary is more flexible when dealing with the signal whose spectral contents vary nonlinearly with respect to time; therefor it has been used to process earthquake signal [13], ECG signal [14].

2.3 Solution to Sparse Representation Model

After constructing a suitable dictionary, the next important task is to develop a reliable and efficient algorithm to solve the sparse representation model. It is tough to find a straightforward approach to solve Eq. (9). A number of methods have been developed to solve such an equation in recent years. These methods can be classified into two main categories: the greedy algorithms and the convex relaxation techniques. The greedy algorithms include matching pursuit (MP) [15], orthogonal matching pursuit (OMP) [16], regularized orthogonal matching pursuit (ROMP) [17], etc., and the convex relaxation techniques include basis pursuit (BP) [18], basis pursuit denoising (BPD) [19], etc. In the following, we put an emphasis on the MP and BPD algorithm introduction.

  1. (a)

    Matching pursuit

Originated from Ref. [15], matching pursuit algorithm uses a greedy heuristic to iteratively construct a best decomposition of the original signal. The basic idea of matching pursuit algorithm is that it attempts to represent a signal \(\varvec{x}\) from Hilbert space as a weighted sum of atoms \(\varvec{\phi}_{\gamma i}\) taken from an over-complete dictionary \(\varvec{\phi}\),

$$\varvec{x} = \sum\limits_{i = 1}^{m} {a_{\gamma i} }\varvec{\phi}_{\gamma i} + \varvec{r}_{m}$$
(14)

where \(a\) is the weighting factor for each atom, \(\gamma\) is the parameter of each atom, \(\varvec{r}_{m}\) is the residual signal. \(\varvec{r}_{m}\) can be obtained by

$$\varvec{r}_{i} = \varvec{r}_{i - 1} - a_{\gamma i}\varvec{\phi}_{\gamma i}$$
(15)

when \(i = 1\), \(\varvec{r}_{0} = \varvec{x}\), the weighting factor \(a = \left\langle {\varvec{r}_{i - 1} ,\varvec{\phi}_{\gamma i} } \right\rangle\) is the inner product of the residual signal and the atom.

Given the fixed over-complete dictionary, the matching pursuit algorithm first finds the atom which has the biggest inner product with the signal, then subtracts the contribution made by that atom from the residual, and repeats the process until the stopping criterion is satisfied. The procedures of matching pursuit are illustrated in Table 1.

Table 1 Procedures of matching pursuit algorithm
  1. (b)

    Basis pursuit denoising

Naturally, \(l_{0}\)-norm is used to measure the sparsity of the representation coefficients. However, \(l_{0}\)-norm minimization is a nondeterministic polynomial (NP) problem due to its nature of combinational optimization, which is too complex to solve. By replacing the \(l_{0}\)-norm in Eq. (9) with the \(l_{1}\)-norm, an approximate solution can be attained as follows

$$\mathop {\hbox{min} }\limits_{\varvec{c}} \left\| \varvec{c} \right\|_{1} \quad {\text{subject}}\,{\text{to}} \quad \left\| {\varvec{Ac} - \varvec{y}} \right\|_{2}^{2} \le \varepsilon$$
(16)

where \(\left\| \cdot \right\|_{1}\) is the \(l_{1}\)-norm defined as \(\left\| \varvec{c} \right\|_{1} = \sum\nolimits_{i} {\left| {c_{i} } \right|}\). The basis pursuit denoising can be defined using such an Eq. (16). Donoho [19] has proven that under certain conditions, i.e., the solution is sparse enough, the solution to Eq. (16) is equivalent to the Eq. (9).

Equation (16) can also be written as a more general version

$$J(\varvec{c}) = \arg \mathop {\hbox{min} }\limits_{\varvec{c}} \left\{ {\frac{1}{2}\left\| {\varvec{y} - \varvec{Ac}} \right\|_{2}^{2} + \lambda \left\| \varvec{c} \right\|_{1} } \right\}$$
(17)

where \(\left\| \cdot \right\|_{2}^{2}\) is the \(l_{2}\)-norm defined as \(\left\| \varvec{c} \right\|_{2}^{2} = \sum\nolimits_{i} {\left| {c_{i} } \right|}^{2}\), \(\lambda\) is a scalar regularization parameter which balances the tradeoff between the reconstruction error and the sparsity.

There are two terms in the right side of Eq. (17): the data fidelity term \(\left\| {\varvec{y} - \varvec{Ac}} \right\|_{2}^{2}\) and the penalty term \(\left\| \varvec{c} \right\|_{1}\). To solve Eq. (17), one can optimize either the data fidelity term or the penalty term. The detailed optimization method will be introduced in Sect. 3.

3 Over-Complete Wavelet Basis Dictionary

3.1 General Over-Complete Wavelet Basis Dictionary

The key point of mechanical fault feature extraction is to construct an appropriate over-complete wavelet basis dictionary. As is well known, the more similarity between the wavelet basis and the fault signal, the sparser the representation coefficients are. According to the experience, the choice of wavelet basis may vary from one to another. Generally, the single-side wavelet is often used to construct the basis matrix for bearing fault signature extraction [20]; the double-side wavelets are usually used for faulty gear vibration signal processing [21]; and chirp signal is often used for radar signal analysis [22], etc. From the literature, the most widely used wavelet basis for mechanical fault feature extraction are more likely to be Laplace wavelet and the Morlet wavelet, which are introduced in detail in the following.

  1. (a)

    Laplace wavelet basis

The Laplace wavelet is a complex, analytic, single-sided damped exponential wavelet. It is firstly constructed by Strang G in 1996 [23]. Since its waveform is similar to the vibration impulse caused by bearing faults, Laplace wavelet is usually selected to construct the over-complete dictionary for bearing fault detection. As one of the most popular non-orthogonal wavelets, the real field of the Laplace wavelet is defined as

$$\psi \left( {f,\zeta ,\tau ,t} \right) = \psi_{\gamma } \left( t \right) = \left\{ {\begin{array}{*{20}l} {A{\text{e}}^{{\frac{ - \zeta }{{\sqrt {1 - \zeta^{2} } }}2\pi f\left( {t - \tau } \right)}} \sin 2\pi f\left( {t - \tau } \right)} \hfill & { \, t \in \left[ {\tau ,\tau + W_{s} } \right]} \hfill \\ 0 \hfill & {\text{else}} \hfill \\ \end{array} } \right.$$
(18)

where the parameter vector \(\gamma = \left( {f,\zeta ,\tau } \right)\) determines the wavelet properties. These parameters \(\left( {f,\zeta ,\tau } \right)\) denote frequency \(f \in \varvec{R}^{ + }\), damping ratio \(\zeta \in [0,1) \,\subset \, \user2{R}^{2}\), and time index \(\tau \in \user2{R}\), respectively. The coefficient A is an arbitrary scaling factor used to scale each wavelet to unity norm. The range W s ensures that the wavelet is compactly supported and has nonzero finite length, but the parameter W s is generally not explicitly expressed.

  1. (b)

    Morlet wavelet basis

Morlet wavelet is one of the most popular non-orthogonal wavelets, defined in the time domain as a harmonic wave multiplied by a Gaussian time domain window

$$\psi \left( t \right) = \exp \left( { - \frac{{\beta^{2} t^{2} }}{2}} \right)\cos \left( {\pi t} \right)$$
(19)

It is a cosine signal that decays exponentially on both the left and right sides, which makes it very similar to an impulse caused by the gear localized defects at a constant speed in terms of shape. Therefore, Morlet wavelet is often selected to build the over-complete dictionary when extracting the gear fault feature. In order to reduce the complexity, the parametric formulation of Morlet wavelet is given

$$\psi \left( {f,\zeta ,\tau ,t} \right) = \psi_{\gamma } \left( t \right) = A{\text{e}}^{{\frac{ - \zeta }{{\sqrt {1 - \zeta^{2} } }}[2\pi f\left( {t - \tau } \right)]^{2} }} \cos 2\pi f\left( {t - \tau } \right) \,$$
(20)

where the parameter vector \(\gamma = \left( {f,\zeta ,\tau } \right)\) also determines the wavelet properties. These parameters \(\left( {f,\zeta ,\tau } \right)\) denote frequency \(f \in \varvec{R}^{ + }\), damping ratio \(\zeta \in \left[ {0,1} \right) \subset \varvec{R}^{ + }\), and time index \(\tau \in \varvec{R}\), respectively. The parameter A is used to normalize the wavelet function.

Setting the discrete parameters \(f\), \(\zeta\) and \(\tau\) as the subsets of F, Z and T c respectively, there is

$$\begin{aligned} F & { = }\left\{ {f_{1} ,f_{2} , \ldots ,f_{i} } \right\} \subset R \\ Z & = \left\{ {\zeta_{1} ,\zeta_{2} , \ldots ,\zeta_{j} } \right\} \subset R^{ + } \cap \left[ {0,1} \right) \\ T_{c} & = \left\{ {\tau_{1} ,\tau_{2} , \ldots ,\tau_{k} } \right\} \subset R \\ \end{aligned}$$
(21)

With different parameters, dictionary can be constructed using the following equation

$$\Psi = \left\{ {\psi_{\gamma } \left( t \right){:} \, \gamma \in F \times Z \times T_{c} } \right\} = \left\{ {\psi \left( {f,\zeta ,\tau ,t} \right){:} \, f \in F,\zeta \in Z,\tau \in T_{c} } \right\}$$
(22)

Each item in the dictionary is called an atom. In this way, the over-complete dictionary has been constructed systematically.

3.2 Correlation Filtering

If the suitable wavelet basis (Laplace wavelet, Morlet wavelet or others) is already chosen, correlation filtering is applied to identify the optimal wavelet atom with the optimal set of parameters \(\left( {\bar{f},\bar{\zeta },\bar{\tau }} \right)\), which is most similar to the transient impulses caused by a localized fault.

Correlation, measured by inner product operation, is defined to quantify the degree of similarity between the wavelet basis and the original signal. The correlation function \(c_{\gamma }\) is defined to calculate the correlation degree between the basis \(\psi_{\gamma } \left( t \right)\) and the original signal \(x\left( t \right)\)

$$c_{\gamma } = \cos \theta = \frac{{\left| {\left\langle {\psi_{\gamma } \left( t \right),x\left( t \right)} \right\rangle } \right|}}{{\left\| {\psi_{\gamma } \left( t \right)} \right\|^{2} \left\| {x\left( t \right)} \right\|^{2} }}$$
(23)

where \(\theta\) is the angle between \(\psi_{\gamma } \left( t \right)\) and \(x\left( t \right)\). The smaller the angle is, the more similar the basis \(\psi_{\gamma } \left( t \right)\) and the original signal is. Therefore, the optimal wavelet atom with optimal parameters \(\left( {\bar{f},\bar{\zeta },\bar{\tau }} \right)\) can be obtained by maximizing the correlation function \(c_{\gamma }\) at each time value from the constructed Laplace wavelet or Morlet wavelet dictionary. Peaks of \(c_{\gamma }\) for a given time value \(\tau\) can be represented as

$$k_{r} \left( \tau \right) = \mathop {\hbox{max} }\limits_{{f \in {\mathbf{F, }}\zeta \in {\mathbf{Z}}}} c_{\gamma } = c\left( {\bar{f},\bar{\zeta },\tau } \right)$$
(24)

and the time index parameter \(\bar{\tau }\) can be calculated by maximizing the coefficient \(k_{r} \left( \tau \right)\). With correlation filtering, the optimal parameters \(\left( {\bar{f},\bar{\zeta },\bar{\tau }} \right)\) found effectively, the optimal wavelet atom with these parameters can be constructed.

4 Solution to Representation Coefficients Based on BPDN

4.1 Data Fidelity Optimization

To represent the fault transients by sparse coefficients, the Basis Pursuit Denoising defined by Eq. (17) should be solved. Only after the minimization of objective function in Eq. (17), a sparse representation vector c can be obtained. To minimize J(c), an iterative algorithm is introduced. The traditional gradient descent methods, such as iterative shrinkage/thresholding algorithm (ISTA) [24], fast IST algorithm (FISTA) [25] and so on, have the drawback of slow convergence. In order to improve the speed of convergence, Manya has proposed a novel technique termed the split augmented Lagrangian shrinkage algorithm (SALSA) using the Hessian of the data fidelity term [26]. The algorithm updates the vector c until the optimal solution \({\hat{\varvec{c}}}\) is gained, so as to minimize the objective function J(c).

Considering the unconstrained optimization problem in which the objective function is the summation of two functions, the Eq. (17) can be written as

$$\mathop {\hbox{min} }\limits_{\varvec{c}} \left\{ {f_{1} \left( \varvec{c} \right) + f_{2} \left( \varvec{c} \right)} \right\}$$
(25)

where \(f_{1} \left( \varvec{c} \right) = \frac{1}{2}\left\| {\varvec{y} - \varvec{Ac}} \right\|_{2}^{2}\), \(f_{2} \left( \varvec{c} \right) = \lambda \left\| \varvec{c} \right\|_{1}\). Then variable splitting is introduced to create a new variable denoted by u, to serve as the augment of \(f_{1}\), under the constraint that u = c. This leads to the constrained problem

$$\mathop {\hbox{min} }\limits_{{\varvec{u},\varvec{c}}} \left\{ {f_{1} \left( \varvec{u} \right) + f_{2} (\varvec{c})} \right\} \quad {\text{s}}.{\text{t}}. \quad \varvec{u} = \varvec{c}$$
(26)

which is obviously equivalent to the unconstrained problem in Eq. (25). This problem can be represented as the so-called augmented Lagrangian problem

$$\mathop {\hbox{min} }\limits_{z} \, E\left( \varvec{z} \right)\quad {\text{ s}} . {\text{t}} .\quad \varvec{Hz} - \varvec{b} = {\mathbf{0}}$$
(27)

where \(E\left( \varvec{z} \right) = f_{1} \left( \varvec{u} \right) + f_{2} \left( \varvec{c} \right)\), \(\varvec{z} = \left[ \begin{aligned} \varvec{u} \hfill \\ \varvec{c} \hfill \\ \end{aligned} \right]\), \(\varvec{b} = {\mathbf{0}}\), \(\varvec{H} = [\varvec{I} \, - \varvec{I}]\). The augmented Lagrangian function for this problem is defined as

$$L\left( {\varvec{z},\lambda ,\mu } \right) = E\left( \varvec{z} \right) + \lambda^{T} (\varvec{Hz} - \varvec{b}) + \frac{\mu }{2}\left\| {\varvec{Hz} - \varvec{b}} \right\|_{2}^{2}$$
(28)

where \(\lambda\) is a vector of Lagrange multipliers and \(\mu \ge 0\) is the penalty parameter. The augmented Lagrangian method (ALM) is used to minimize the objective function \(L\left( {\varvec{z},\lambda ,\mu } \right)\), the following results can be obtained

$$\varvec{z}^{(k + 1)} = \arg \mathop {\text{min}}\limits_{z} \left\{ {E\left( \varvec{z} \right) + \frac{\mu }{2}\left\| {\varvec{Hz} - \varvec{d}^{\left( k \right)} } \right\|_{2}^{2} } \right\}$$
(29)
$$\varvec{d}^{{\left( {k + 1} \right)}} = \varvec{d}^{(k)} - (\varvec{Hz}^{(k + 1)} - \varvec{b})$$
(30)

where k is the iteration counter. Considering the concrete forms of the function E(z), matrix H and the vector b, novel results can be written as

$$\varvec{u}^{(k + 1)} = \arg \mathop {\text{min}}\limits_{\varvec{u}} \left\{ {f_{1} \left( \varvec{u} \right) + \frac{\mu }{2}\left\| {\varvec{u} - \varvec{c}^{\left( k \right)} - \varvec{d}^{\left( k \right)} } \right\|_{2}^{2} } \right\} = \arg \mathop {\text{min}}\limits_{\varvec{u}} \left\{ {\frac{1}{2}\left\| {\varvec{y} - \varvec{Au}} \right\|_{2}^{2} + \frac{\mu }{2}\left\| {\varvec{u} - \varvec{c}^{\left( k \right)} - \varvec{d}^{\left( k \right)} } \right\|_{2}^{2} } \right\}$$
(31)
$$\varvec{c}^{(k + 1)} = \arg \mathop {\text{min}}\limits_{\varvec{c}} \left\{ {f_{2} \left( \varvec{c} \right) + \frac{\mu }{2}\left\| {\varvec{u}^{{\left( {k + 1} \right)}} - \varvec{c} - \varvec{d}^{\left( k \right)} } \right\|_{2}^{2} } \right\} = \arg \mathop {\text{min}}\limits_{c} \left\{ {\lambda \left\| \varvec{c} \right\|_{1} + \frac{\mu }{2}\left\| {\varvec{u}^{{\left( {k + 1} \right)}} - \varvec{c} - \varvec{d}^{\left( k \right)} } \right\|_{2}^{2} } \right\}$$
(32)
$$\varvec{d}^{{\left( {k + 1} \right)}} = \varvec{d}^{(k)} - (\varvec{u}^{(k + 1)} - \varvec{c}^{{\left( {k + 1} \right)}} )$$
(33)

Equation (31) is a strictly convex quadratic function to be minimized, which leads to the solution u (k+1) directly, and the soft threshold facilitates the minimization of Eq. (32), after which the iteration procedure of SALSA can be listed as

$$\varvec{u}^{{\left( {k + 1} \right)}} = \left( {\varvec{A}^{H} \varvec{A} + \mu \varvec{I}} \right)^{ - 1} \left( {\varvec{A}^{H} \varvec{y} + \mu (\varvec{c}^{k} + \varvec{d}^{k} )} \right)$$
(34)
$$\varvec{c}^{{\left( {k + 1} \right)}} = soft\left( {\varvec{u}^{{\left( {k + 1} \right)}} - \varvec{d}^{k} ,\frac{\lambda }{\mu }} \right)$$
(35)
$$\varvec{d}^{{\left( {k + 1} \right)}} = \varvec{d}^{\left( k \right)} - \varvec{u}^{{\left( {k + 1} \right)}} + \varvec{c}^{{\left( {k + 1} \right)}}$$
(36)

By the iterative numerical algorithm SALSA, the optimal sparse solution \({\hat{\varvec{c}}}\) can be obtained eventually. With the sparse solution \({\hat{\varvec{c}}}\), the reconstructed \(\hat{\varvec{x}}\) can be represented as \(\hat{\varvec{x}} = \varvec{A}{{\hat{\varvec{c}}}}\). There are successive periodic non-zero coefficients in \({\hat{\varvec{c}}}\), which represents the transients in the original signal.

4.2 Penalty Optimization

Unlike SALSA, Majorization Minimization (MM) algorithm mainly focuses on the penalty optimization to solve the Eq. (17). Based on non-quadratic majorization, the MM algorithm utilizes a sequence of simpler convex optimization problems to replace the original ill-posed inverse problems and yet is an effective and a widely applicable method [27].

The function J(c) can be easily minimized suppose it is quadratic. The MM algorithm utilizes this characteristic by solving a series of simpler minimization problems

$$c_{k + 1} = \arg \mathop {\hbox{min} }\limits_{c} G_{k} \left( c \right)$$
(37)

where k is the iteration counter, k = 1, 2, 3, …. The MM algorithm requires that each function \(G_{k} \left( c \right)\) should be a majorizer (upper bound) of J(c) and it coincides with J(c) at \(c = c_{k}\). That is

$$\begin{aligned} & \forall c,G_{k} \left( c \right) \ge J\left( c \right) \\ & G_{k} \left( {c_{k} } \right) = J\left( {c_{k} } \right) \\ \end{aligned}$$
(38)

The majorizer should be chosen so as to be easier to minimize. Considering the data fidelity term in the cost function (17) is strictly quadratic, we simply need to majorize the penalty term.

We mark the penalty \(\left\| c \right\|_{1}\) as \(\Psi \left( c \right)\), term \(\left| c \right|\) as \(\upphi \left( c \right)\). Hence, \(Y\left( c \right) = \sum\nolimits_{n = 1}^{N} {\left| {c\left( n \right)} \right|} = \sum\nolimits_{n = 1}^{N} {f\left( {c\left( n \right)} \right)}\) . \(\phi \left( c \right)\) is an absolute value function and thus is non-differentiable and non-strictly convex, which makes the Eq. (17) difficult to solve. According to the MM algorithm shown in Eq. (38), a quadratic function \(g\left( c \right)\) can be found to majorize \(\phi \left( c \right)\) of the general form

$$g\left( c \right) = mc^{2} + nc + b$$
(39)

where the parameters m, n and b are constants, the majorizer \(g\left( c \right)\) should be the upper bound for \(\phi \left( c \right)\) that coincides with \(\phi \left( c \right)\) at a specified point \(c_{k}\). For this quadratic majorizer, conditions in Eq. (38) are equivalent to

$$\begin{aligned} g\left( {c_{k} } \right) & = \phi \left( {c_{k} } \right) \\ g^{\prime } \left( {c_{k} } \right) & = \phi^{\prime } \left( {c_{k} } \right) \\ \end{aligned}$$
(40)

Solving for m and b gives \(m = \left( {\phi^{\prime} \left( {c_{k} } \right)/2c_{k} } \right) - \left( {n/2c_{k} } \right)\), \(b =\phi \left( {c_{k} } \right) - \left( {c_{k} /2} \right)\phi^{\prime} \left( {c_{k} } \right) - \left( {n/2} \right)c_{k}\), thus leading to the majorizer \(g\left( c \right)\) in Eq. (39) given by

$$g\left( c \right) = \left( {\frac{{\phi^{\prime } \left( {c_{k} } \right)}}{{2c_{k} }} - \frac{n}{{2c_{k} }}} \right)c^{2} + nc + \left( {\phi \left( {c_{k} } \right) - \frac{{c_{k} }}{2}\phi^{\prime } \left( {c_{k} } \right) - \frac{n}{2}c_{k} } \right)$$
(41)

Considering a special condition of function \(g\left( c \right)\), we set the unknown parameter n = 0; then the parameter m and b become \(m = \left( {\phi^{\prime } \left( {c{}_{k}} \right)/2c{}_{k} } \right)\), \(b = \phi \left( {c_{k} } \right) - \left( {c_{k} /2} \right)\phi^{\prime } \left( {c_{k} } \right)\), thus \(g\left( c \right)\) turns out to be:

$$g\left( c \right) = \frac{{\phi^{\prime } \left( {c_{k} } \right)}}{{2c_{k} }}c^{2} + \phi \left( {c_{k} } \right) - \frac{{c_{k} }}{2}\phi^{\prime } \left( {c_{k} } \right)$$
(42)

Taking the concrete form of \(\phi \left( c \right)\) into consideration, the function \(g\left( c \right)\) can be written in a matrix format as:

$$G_{k} \left( c \right) = \frac{1}{2}\left\| {y - Ac} \right\|_{2}^{2} + \lambda \left( {\frac{1}{2}c^{*}\Lambda _{k}^{ - 1} c + \frac{1}{2}\left\| {c_{k} } \right\|_{1} } \right)$$
(43)

where \(\Lambda _{k}\) denotes the diagonal matrix with vector \(\left| {c_{k} } \right|\) along its diagonal. Then, the MM updates (37) for \(c_{k}\) as:

$$c_{k + 1} = \arg \mathop {\hbox{min} }\limits_{c} \left[ {\frac{1}{2}\left\| {y - Ac} \right\|_{2}^{2} + \lambda \left( {\frac{1}{2}c^{*}\Lambda _{k}^{ - 1} c{ + }\frac{1}{2}\left\| {c_{k} } \right\|_{1} } \right)} \right]$$
(44)

The last term in Eq. (44) can be omitted because it does not depend on c; thus a new update equation also called cost function is transformed into:

$$c_{k + 1} = \arg \mathop {\hbox{min} }\limits_{c} \frac{1}{2}\left\| {y - Ac} \right\|_{2}^{2} + \frac{\lambda }{2}c^{*} {\Uplambda }_{k}^{ - 1} c$$
(45)

Equation (45) is quadratic in terms of c, so the solution to this problem can be written explicitly using linear algebra as:

$$c_{k + 1} = \left( {A^{*} A + \lambda {\Uplambda }_{k}^{ - 1} } \right)^{ - 1} A^{*} y$$
(46)

Taking the sparsity of c into consideration, the elements of \({\Uplambda }_{k}^{ - 1}\) would go towards infinity with the iterative procedure going on. To avoid this problem, the matrix inverse lemma is introduced. After that, the update equation can be expressed as:

$$c_{k + 1} = \frac{1}{\lambda }\Uplambda_{k} \left[ {A^{*} y - A^{*} \left( {A\Uplambda_{k} A^{*} + \lambda I} \right)^{ - 1} A\Uplambda_{k} A^{*} y} \right]$$
(47)

By the iterative procedure in Eq. (47) of MM algorithm, the optimal sparse solution \({\hat{\varvec{c}}}\), which is used to represent the fault feature, can be found. With the sparse solution \({\hat{\varvec{c}}}\), the reconstructed signal can be represented as \(\hat{x} = \varvec{A} \hat {\varvec{c}}\).

5 Applications

5.1 Application in Gearbox Transient Feature Extraction

To verify the effectiveness of the proposed methods for the gearbox fault diagnosis, the experimental data were acquired from an automobile transmission gearbox which has five forward speeds and one backward speed. The structure of the gearbox is shown in Fig. 1. During the test, a broken-tooth fault occurred on the driving gear of the third speed. The vibration signal was acquired by an accelerometer mounted on the outer case of the gearbox when it was loaded with the third speed gearbox.

Fig. 1
figure 1

The automobile transmission gearbox: a the structure of gearbox; and b gearbox setup

For a gear transmission, the meshing frequency \(f_{m}\) is calculated by

$$f_{m} = \frac{nz}{60i}$$
(48)

where \(z\) is the number of the gear teeth, \(n\) is the rotating speed of the input shaft in rpm, and \(i\) is the transmission ratio. In this test, n was set as 1600 ± 16 rpm, generally we use 1600 rpm as the speed of the input shaft. Then the meshing frequency of the third speed is calculated to be 500 Hz. The sampling frequency was set as 3000 Hz. The working parameters are shown in Table 2.

Table 2 Working parameters of the third speed gears
  1. (a)

    Sparsity - based fault feature extraction by optimizing data fidelity term

A measured vibration signal with a length of 900 samples and its Fourier spectrum are shown in Fig. 2a, b. The fault feature of the gearbox vibration signal cannot be identified from Fig. 2a. From Fig. 2b, the main frequency component can be identified as 500 Hz, which is in fact the meshing frequency.

Fig. 2
figure 2

a The measured vibration signal and b its Fourier spectrum

Considering the Morlet wavelet is similar to the impulse caused by the gear localized defect, it is selected as the atom to construct the over-complete dictionary. After constructing the dictionary, the sparse representation model can be established. Then the SALSA algorithm can be applied to solve the sparse representation model. Figure 3 presents the analysis result of the vibration signal obtained by the proposed method. Figure 3a displays the optimal wavelet atom based on correlation filtering. The related parameters are \(\bar{f} = 272\) Hz, \(\bar{\zeta } = 0.0074\) and \(\bar{\tau } = 0.0633\) s. The first N elements of the representation coefficient vector \({\hat{\mathbf{c}}}\) are given in Fig. 3b. \(3\sigma\) is used as the threshold to filter away the small values to extract the principle components, and then the final estimated vector \(\hat{\mathbf{c}}^{\prime}\) is illustrated in Fig. 3c. The cyclic period \(\hat{T} = 50.00\) ms can be easily identified in Fig. 3c, which is consistent with the theoretical value \(T = 50.00\) ms. The impulse time can also be identified from Fig. 3c. The parameter values used in this case are listed in Table 3.

Fig. 3
figure 3

The representation results of the vibration signal by optimizing the data fidelity term: a the optimal wavelet atom; b the representation coefficients; c the filtered sparse coefficients; and d the meshing period

Table 3 Conclusion of all parameters of transient components in the vibration signal of faulty gearbox

The removed small coefficients representing the meshing frequency are shown in Fig. 3d. The meshing period \(T_{0} = 0.002\) s can be observed in Fig. 3d, indicating that the meshing frequency is 500 Hz. This is consistent with the theoretical meshing frequency 500 Hz. As a result, it is proved that the proposed method is effective in identifying the impulse occurrence time and the period parameter.

A comparison between the reconstructed impulse responses and the original signal is presented in Fig. 4. The interval period between the transients is consistent with the rotating period of the third speed gears. Hence, it indicates that there is a localized fault in the third speed gear of the gearbox. After overhaul, it has been found that the driving gear of the third speed is broken.

Fig. 4
figure 4

The comparison between a the original vibration signal and b the reconstructed signal

  1. (b)

    Sparsity - based fault feature extraction by optimizing penalty term

Another vibration signal was measured on the same gearbox with a length of 900 and its frequency spectrum is shown in Fig. 5a, b. From Fig. 5a, the impulse period cannot be identified because of the noise corruption; from Fig. 5b, the frequency of the main component can be identified as 500 Hz.

Fig. 5
figure 5

a The measured gearbox defective vibration signal; and b its Fourier spectrum

Firstly, the optimal Morlet wavelet atom is obtained by using the correlation filtering to construct the over-complete dictionary. The sparse representation model can subsequently be built. Then the MM algorithm is applied to solve the sparse representation model by optimizing the penalty term. The associated parameters of the optimal wavelet atom are \(\bar{f} = 272\) Hz, \(\bar{\zeta } = 0.0074\) and \(\bar{\tau } = 0.0633\) s. Figure 6a, obtained by the proposed method, shows the sparse coefficients, which represents a series of periodic impulses. The average time period of these impulses is around 0.0505 s, which is very close to the theoretical value 0.050 s. Figure 6b illustrates the reconstructed signal, whose periodical features represent the localized fault existing in the driving gear of the third speed. The analysis results demonstrate that the proposed transient sparse representation method can extract the transients and reduce the noise effectively, thus the machinery condition can be identified.

Fig. 6
figure 6

The analysis results of vibration signal by optimizing the penalty term: a sparse coefficients; and b the reconstructed signal

5.2 Application in Bearing Transient Feature Extraction

To verify the effectiveness of the proposed method for bearing fault diagnosis, the experimental data were acquired from a test rig, which is shown in Fig. 7. The vibration signal is measured from a rotating machine test rig with the sampling frequency 51.2 kHz. The test rig consists of a driving motor and a shaft, which is driven by the motor and supported by two bearing blocks. The bearing used in this test is NJ208 (TMB) cylindrical roller bearing. Details of the geometry and fault frequencies of this type of bearings can be found in Table 4. In this test, the fault frequency of the outer race, the inner race and the rolling element are 142.8 Hz (7.003 ms), 206.3 Hz (4.847 ms) and 132.6 Hz (7.541 ms), respectively, when the shaft rotates at 1496 RPM.

Fig. 7
figure 7

Rotating machine test rig

Table 4 The geometry and fault frequencies of bearings
  1. (c)

    Sparsity - based fault feature extraction by optimizing data fidelity term

The measured outer race fault vibration signal with 4096 samples and its Fourier spectrum are shown in Fig. 8a, b. The fault feature cannot be seen from Fig. 8. Considering the Laplace wavelet is morphologically similar to the impulse caused by the localized defect in rolling bearing, it is selected as the atom to construct the over-complete dictionary. The sparse representation model can then be founded. To solving the sparse representation model, the analysis results in Fig. 9 can be obtained by applying the SALSA algorithm. Figure 9a shows the optimal wavelet atom by using correlation filtering. The related parameters are \(\bar{f} = 3024\) Hz, \(\bar{\zeta } = 0.0890\), \(\bar{\tau } = 0.0159\) s. The first N elements of the representation coefficient vector \({\hat{\mathbf{c}}}\) are given in Fig. 9b. The cyclic period \(\hat{T} = 7.01\) ms can be identified in Fig. 9c, which is consistent with the theoretical value \(T = 7.00\) ms. The occurrence time of impulse can also be identified in Fig. 9c.

Fig. 8
figure 8

The measured outer race fault vibration signal. a Vibration signal, and b Fourier spectrum

Fig. 9
figure 9

The analysis results of the outer race fault vibration signal by optimizing the data fidelity term. a Optimal Laplace wavelet atom; b sparse representation coefficients, and c reconstructed signal

The measured inner race fault vibration signal and its Fourier spectrum are shown in Fig. 10a, b. The fault feature cannot be identified from Fig. 10.

Fig. 10
figure 10

The measured inner race fault vibration signal. a Vibration signal, and b Fourier spectrum

Figure 11 exhibits the analysis result of the inner race vibration signal obtained by the proposed method. The optimal wavelet atom obtained by using correlation filtering is shown in Fig. 11a. The related parameters are \(\bar{f} = 6402\) Hz, \(\bar{\zeta } = 0.1400\), \(\bar{\tau } = 0.0467\) s. The first N elements of the representation coefficient vector \({\hat{\mathbf{c}}}\) are presented in Fig. 11b. The cyclic period \(\hat{T} = 4.85\) ms can be clearly identified in Fig. 11c, which is consistent with the theoretical value \(T = 4.85\) ms.

Fig. 11
figure 11

The analysis results of the inner race fault vibration signal by optimizing the data fidelity term. a Optimal Laplace wavelet atom, b sparse representation coefficients, c reconstructed signal

The measured rolling element fault vibration signal and its Fourier spectrum are shown in Fig. 12a, b.

Fig. 12
figure 12

The measured rolling element fault vibration signal: a vibration signal, and b Fourier spectrum

Figure 13 gives the analysis result of the rolling element vibration signal obtained by the proposed method. The optimal wavelet atom obtained by using correlation filtering is shown in Fig. 13a. The related parameters are \(\bar{f} = 3024\) Hz, \(\bar{\zeta } = 0.089\), \(\bar{\tau } = 0.0159\) s. The first N elements of the representation coefficient vector \({\hat{\mathbf{c}}}\) are displayed in Fig. 13b. The cyclic period \(\hat{T} = 7.47\) ms can be identified in Fig. 13c, which is consistent with the theoretical value \(T = 7.54\) ms.

Fig. 13
figure 13

The analysis results of the rolling element fault vibration signal by optimizing the data fidelity term: a Optimal Laplace wavelet atom, b sparse representation coefficients, and c reconstructed signal

  1. (d)

    Sparsity - based fault feature extraction by optimizing penalty term

Another group of fault signals with a length of 5120 samples is also measured from the same test rig as Fig. 7. The over-complete dictionary is constructed using the Laplace wavelet, from which a signal sparse representation model can be established. Then the MM algorithm is applied to solve the model to obtain the representation coefficients. The measured outer race fault vibration signal and its Fourier spectrum are shown in Fig. 14a, b. No information related to bearing defects can be recognized.

Fig. 14
figure 14

The measured outer race fault vibration signal: a vibration signal, and b Fourier spectrum

Figure 15 gives the analysis result of the vibration signal obtained by the proposed method. The representation coefficient vector \({\hat{\mathbf{c}}}\) is given in Fig. 15a, in which the cyclic period \(\hat{T} = 7.02\) ms can be discerned. The reconstructed signal is shown in Fig. 15b.

Fig. 15
figure 15

The analysis results of the outer race fault vibration signal by optimizing penalty term: a sparse representation coefficients, and b reconstructed signal

The measured inner race fault vibration signal and its Fourier spectrum are shown in Fig. 16a, b, which cannot easily determine the bearing health condition.

Fig. 16
figure 16

The measured inner race fault vibration signal: a vibration signal, and b Fourier spectrum

Figure 17 shows the analysis result of the vibration signal obtained by the proposed method. The representation coefficient vector \({\hat{\mathbf{c}}}\) is given in Fig. 17a, in which the cyclic period \(\hat{T} = 4.84\) ms can be discerned. The reconstructed signal is shown in Fig. 17b, yielding the easily-observed impulses.

Fig. 17
figure 17

The analysis results of the inner race fault vibration signal by optimizing penalty term: a sparse representation coefficients, and b reconstructed signal

The measured rolling element fault vibration signal and its Fourier spectrum are shown in Fig. 18a, b. No frequency information associated with rolling elements can be recognized in Fig. 18b.

Fig. 18
figure 18

The measured rolling element fault vibration signal: a vibration signal, and b Fourier spectrum

Figure 19 exhibits the analysis result of the vibration signal obtained by the proposed method. The representation coefficient vector \({\hat{\mathbf{c}}}\) is shown in Fig. 19a, where the cyclic period \(\hat{T} = 7.51\) ms can be identified. The reconstructed signal shown in Fig. 19b clearly shows the cyclic impulses generated by rolling element defect.

Fig. 19
figure 19

The analysis results of the rolling element fault vibration signal by optimizing penalty term: a sparse representation coefficients, and b reconstructed signal

5.3 Application in Compound Fault Feature Extraction

Apart from single fault detection of rotating machinery, compound fault diagnosis also has been gaining more attention in recent years. Taking the compound fault in the gearbox as an example, this section applies the sparse representation method to separating and extracting the compound fault features of gearbox.

The test rig, which is a single stage transmission gearbox in a test-bed, is shown in Fig. 20. There are both bearing and gear faults in the gearbox, which is shown in Fig. 21, respectively. The faulty gear is a helical, whose parameters are listed in Table 5. The bearing model in the experiment is 30,205, taper roller bearing, and its geometric parameters are listed in Table 6. With the known parameters, the theoretical fault feature frequency of the bearing can be calculated as 176.18 Hz.

Fig. 20
figure 20

Experimental gearbox in a test-bed

Fig. 21
figure 21

Fault components

Table 5 Working parameters of gears in the tested gearbox
Table 6 Geometry of the tested bearing

The measured vibration signal with compound faults is shown in Fig. 22, from which the characteristics of each fault cannot be identified clearly. Thus, the sparse representation method is applied to extracting the fault features one by one. In terms of the sequence of the compound fault feature extraction, we take the influence of propagation path of signals into consideration. As the sensor is placed on the bearing end cover, which is closer to the faulty bearing, it is desirable to extract the bearing fault feature at first. Firstly, the iterative algorithm SALSA is selected as the optimization algorithm. Then, the optimal Laplace wavelet, which is effective in bearing fault induced impulse representation and determined using the correlation filtering, is chosen to construct the over-complete dictionary A 1 based on the explanation in Sect. 2. The selected Laplace wavelet is shown in Fig. 23a. Incorporating the dictionary A 1 into the iterative procedure of SALSA, the sparse coefficients \({\hat{\varvec{c}}}_{{\mathbf{1}}}\) of bearing fault can be obtained in Fig. 23b. The corresponding reconstructed signal illustrated in Fig. 23c can also be obtained by the equation \(\hat{x}_{1} = \varvec{A} \hat{\varvec{c}}_{1}\). To acquire the fault characteristics, the envelope spectrum analysis of the reconstructed signal is performed, yielding the result in Fig. 23d. The characteristic frequency of the faulty bearing, 174.1 Hz, can be easily recognized, which is almost identical to the theoretical value 176.18 Hz. Therefore, it can be concluded that the bearing is defective.

Fig. 22
figure 22

The measured signal with compound fault of gearbox

Fig. 23
figure 23

Results of bearing fault signal: a optimal Laplace basis, b sparse coefficients, c reconstructed signal, d the envelope spectrum analysis of the reconstructed signal, and e the estimated bearing fault signal

As we know, the amplitude of each transient impulse caused by localized bearing fault is represented by the sparse vector \({\hat{\varvec{c}}}_{{\mathbf{1}}}\). In order to estimate the real amplitude of bearing fault transients, a constrained optimization strategy is proposed is proposed to estimate the amplitude of each single fault component by introducing the parameter k. The spectrum of the residual fault signal \(x - k\hat{x}_{1}\) is denoted by \(F_{1} \left( f \right)\)

$$\begin{aligned} & \hbox{min} \left\{ {F_{1} \left( f \right)} \right\} \\ & {\text{subject to }}k > 0,f = f_{z1} \, \\ \end{aligned}$$
(49)

where \(x\) is the original measured signal, \(f_{z1}\) is the peak frequency and k is a positive parameter. When \(F_{1} \left( f \right)\) is minimized subject to its constraints, it indicates that the bearing fault component in the residual fault signal has been removed to the largest extent. By solving problem in (48), an optimal value \(k_{opt}\) is acquired and the estimated bearing fault signal can be obtained by the function \(x_{1} = k_{opt} \hat{x}_{1}\). Based on the above description, we can draw that Fig. 23e shows the estimated bearing fault component with \(k_{opt} = 1.332\).

Removing the estimated bearing fault signal from the original signal, the residual signal is shown in Fig. 24. Similar to the bearing fault feature extraction, the SALSA is firstly chosen as the optimization algorithm. Then, the optimal Morlet wavelet basis A2 is obtained by correlation filtering, as presented in Fig. 25a. With the constructed dictionary A2, the iterative procedure can be implemented to gain the sparse vector \(\hat{c}_{2}\) representing the gear fault feature, as shown in Fig. 25b. Its reconstructed signal is obtained in Fig. 25c. The envelope spectrum analysis of the reconstructed signal is illustrated in Fig. 25d, from which the fault characteristic frequency of gear can be identified, 25.6 Hz, close to the theoretical value 24.67 Hz. The analysis indicates that there is a gear localized fault in the tested gearbox.

Fig. 24
figure 24

The residual signal after removing the bearing fault signal

Fig. 25
figure 25

Results of gear fault signal: a optimal Morlet basis, b sparse coefficients, c reconstructed signal, and d the envelope spectrum analysis of the reconstructed signal

6 Discussions

In this chapter, a new transient extraction technique is introduced based on the sparse representation. To be more specific, the sparse representation model and over-complete dictionary are first constructed, and then the model can be solved by optimizing either the data fidelity term or the penalty term. Both are effective in extracting the transients and identifying the periodic parameters. The effectiveness has been demonstrated by the experimental applications. However, some issues about the proposed method still remain to be discussed.

  1. (1)

    In this chapter, the l 1-norm is used to replace the l 0-norm in the sparse representation model. Another available sparsity measurement method is to use the l p-norm, leading to the following equation:

    $$\mathop {\hbox{min} }\limits_{{\varvec{c}}} \left\| \varvec{c} \right\|_{p}^{p} \quad {\text{s.t}}. \quad \left\| {\varvec{Ac} - \varvec{y}} \right\|_{2}^{2} \le \varepsilon$$
    (50)

    Choosing p < 1 will lead to a sparse solution; however, it will also lead to a non-convex optimization problem. Thus we can use \(J\left({\varvec{c}} \right) = \sum\nolimits_{i} {\rho \left({c_{i} } \right)}\) to replace the lp-norm. Actually, any function \(J\left({\varvec{c}} \right) = \sum\nolimits_{i} {\rho \left({c_{i}} \right)}\) with \(\rho \left( {c_{i} } \right)\) being symmetric, monotonically non-decreasing, and with a monotonic non-increasing derivative for \(c \ge 0\) will lead to the sparsity [7].

  2. (2)

    Selection of the wavelet basis is one of the key issues for the proposed method due to its influences on the sparsity of the coefficient vector c. With the increase of the noise amplitudes, the correlation values decrease sharply and thus leading to an error between the estimated value and the theoretical one. Besides, the empirical knowledge about the gearbox fault and bearing fault is used to construct the over-complete dictionary. Therefore, if the dictionary can learn from the measured signal by adding some rotating component fault features, the algorithms in this chapter will be more powerful in mechanical fault diagnosis.

  3. (3)

    The strategy of optimal wavelet atom determination and the algorithms of solving the sparse representation model are also vital for a successful sparse representation application to machinery fault feature extraction.

    • This chapter employs the correlation filtering for the optimal wavelet atom selection. The disadvantage is that larger interval range and smaller step of the parameter subset Ψ, which can increase the accuracy of the result though, would incur excessive computation, thereby decreasing the efficiency of the method. Therefore, the strategy of optimal wavelet basis selection should be further exploited to ensure not only the computational efficiency but also estimation accuracy.

    • This chapter utilizes the SALSA and MM algorithm to optimize the BPD problem. However, the more straightforward and simpler, yet effective, algorithms have not been largely explored for the solution of sparse representation model.