Sparse Representation of the Transients in Mechanical Signals

Zhu, Zhongkui; Fan, Wei; Cai, Gaigai; Huang, Weiguo; Shi, Juanjuan

doi:10.1007/978-3-319-56126-4_9

Zhongkui Zhu⁵,
Wei Fan⁵,
Gaigai Cai⁵,
Weiguo Huang⁵ &
…
Juanjuan Shi⁵

Part of the book series: Smart Sensors, Measurement and Instrumentation ((SSMI,volume 26))

2339 Accesses

Abstract

This chapter focuses on the sparse representation of the transients in mechanical signals. Sparse representation means that the signal can be represented by an optimal linear combination of atoms by a specialized over-complete dictionary, leading to the sparsity of representation coefficients. Signal sparse representation consists of two main aspects, i.e., dictionary construction and optimization solution. This chapter also presents the applications of sparse representation, mainly in mechanical fault feature detection, such as fault detection of rolling bearings, gearboxes and compound bearing faults.

Download chapter PDF

A bearing fault diagnosis method based on sparse decomposition theory

Article 01 August 2016

Xin-peng Zhang, Niao-qing Hu, … Ling Chen

High Resolution Time-Frequency Distribution Based on Short-Time Sparse Representation

Article 14 June 2014

Zhen Liu, Peng You, … Xiang Li

Accurate Spectral Estimation of Non-periodic Signals Based on Compressive Sensing

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Traditional signal representation methods often express a signal as a linear combination of orthogonal atoms which compose a complete dictionary, thus a large number of representation coefficients are required to recover the signal because of the characteristics of the atoms. Instead, by a specialized over-complete dictionary, the signal can be represented by an optimal linear combination of atoms, leading to the sparsity of representation coefficients. Sparse representation has been proven to be one of the powerful tools in signal processing, image processing, computer vision and pattern recognition [1,2,3]. Recently, much work has been done to introduce sparse representation theory to fault feature extraction from mechanical vibration signals [4,5,6].

The main purpose of fault diagnosis is to ensure the availability, reliability and operational safety of the equipment. Fault feature extraction which allows one to distinguish the faulty condition from the normal condition is one of the important tasks in fault diagnosis. When there’s a defect occurring on the rotating elements (e.g. gear or bearings), it will interact with another element and then produce a series of impulses in the vibration signal. Under constant speed operating condition, the vibration responses compose of periodic impulses. These transients display similarly in terms of waveform morphology in the time domain, thus have sparse features. Due to such transient and sparse properties, the fault feature extraction task can be transferred to the task of fault feature sparse representation.

Signal sparse representation consists of two main aspects, i.e., dictionary construction and optimization solution. With a suitably constructed dictionary, few atoms in the dictionary can be merged to represent the fault features effectively, while ineffective in representing the noise. Thus, a sparse representation of the fault features can be obtained, and the background noise can be removed at the same time. Moreover, with an efficient algorithm, the representation coefficients can be easily obtained.

This chapter will present an overview of the sparse representation theory. Moreover it will introduce how to deal with the two main aspects of sparse representation for the mechanical vibration signal processing. The application of sparse representation in mechanical fault feature detection will also be explored in this chapter, such as fault detection of rolling bearings, gearboxes and compound bearing faults.

2 Sparse Representation Theory

2.1 Sparse Representation Model

Consider a matrix $A \in R^{N \times M}$ with $N < M$ whose columns are the atoms $\left\{ {\varvec{a}_{i} } \right\}_{i = 1}^{M}$, there are $N$ linear independent vectors in this matrix. The matrix $\varvec{A}$ spans an $N$-dimension Hilbert space. Suppose the measured fault vibration signal can be written as

$$y(t) = x(t) + n(t)$$

(1)

where $y(t)$ is the measured vibration signal, $x(t)$ is the fault-induced signal component without noise and $n(t)$ is the noise. Equation (1) can also be written as

$$\varvec{y} = \varvec{x} + \varvec{n}$$

(2)

$\varvec{x}$ can be represented with an over-complete matrix $\varvec{A}$ as

$$\varvec{x} = \sum\limits_{i = 1}^{M} {c_{i} \varvec{a}_{i} }$$

(3)

or more compactly $\varvec{x} = \varvec{Ac}$, where $\varvec{c}$ is an $M \times 1$ column vector of representation coefficients. If c is not in the span of the columns of A, Eq. (3) has no solution; otherwise, this equation has infinite number of solutions, with the general solution having $l$ free parameters, where $l$ is the difference between the number of variables and the rank.

Among the general solutions, some may perform better than others. In order to narrow this choice to a well-defined solution, additional criteria are needed [7]. Traditional way to achieve this is to employ the regularization $J(\varvec{c})$. Define the general optimization problem:

$$\mathop {\hbox{min} }\limits_{\varvec{c}} J(\varvec{c})\quad {\text{s}}.{\text{t}}.\;\;\varvec{Ac} = \varvec{x}$$

(4)

There are many possible choices for the objective function $J(\varvec{c})$, from which the well-known choice is the $\left\| \cdot \right\|_{p}$, which denotes the $l_{p}$-norm

$$\left\| \varvec{c} \right\|_{p} = \left( {\sum\nolimits_{i} {\left| {c_{i} } \right|^{p} } } \right)^{{\frac{1}{p}}}$$

(5)

Let $p \to 0$ of the $l_{p}$-norm, the $l_{0}$-norm can be denoted as

$$\left\| \varvec{c} \right\|_{0} = \mathop {\lim }\limits_{p \to 0} \left( {\sum\limits_{i} {\left| {c_{i} } \right|^{p} } } \right)^{{\frac{1}{p}}}$$

(6)

In reality, engineers also use the definition of $l_{0}$-norm below instead

$$\left\| \varvec{c} \right\|_{0} = \# \left\{ {i{:}c_{i} \ne 0} \right\}$$

(7)

which represents the total number of non-zero elements in a vector. In such underdetermined linear systems of Eq. (3), the aim is to find a sparsest coefficient vector $\varvec{c}$ to “explain” the signal $\varvec{x}$. The sparsest solution means the solution which has the fewest non-zero elements, i.e. the lowest $l_{0}$-norm, thus leading to the following equation

$$\mathop {\hbox{min} }\limits_{\varvec{c}} \left\| \varvec{c} \right\|_{0} \quad {\text{s}}.{\text{t}}.\;\;\varvec{Ac} = \varvec{x}$$

(8)

Considering the noise component in the measured signal $\varvec{y}$, Eq. (8) can be written as

$$\mathop {\hbox{min} }\limits_{\varvec{c}} \left\| \varvec{c} \right\|_{0} \quad {\text{s}}.{\text{t}}.\quad \left\| {\varvec{Ac} - \varvec{y}} \right\|_{2}^{2} \;\; \le \varepsilon$$

(9)

Equation (9) is the sparse representation model.

After the construction of the representation model, two main aspects should be taken into consideration: (a) how to construct a suitable dictionary $\varvec{A}$ to ensure the sparsity of the coefficient vector $\varvec{c}$; (b) how to solve the model to obtain the sparse representation vector. These two problems will be elaborated in the following.

2.2 Construction of the Over-Complete Dictionary

Signal representation can be considered as a way to observe and learn the features of the signal from different aspects. Signal processing techniques commonly require more meaningful representations which can capture the useful characteristics of the signal [8]. The measured signal can be sparsely represented over a dictionary, which means few atoms in the dictionary can be merged to form the signal, indicating that only few coefficients are involved to seize the concerned information. Therefore, it is vital to develop a well-constructed dictionary so that a sparse representation of the signal features can be led. To achieve a proper dictionary, an over-complete dictionary has been established and commonly used.

(a)
Gabor dictionary

Gabor atoms are proposed by Dennis Gabor in which a family of functions is built from translations and modulations of a generating function. The Gabor atom is defined as

$$g_{\gamma } (t) = \frac{1}{\sqrt s }g(\frac{t - u}{s}){\text{e}}^{j\xi t}$$

(10)

where $g(t) = {\text{e}}^{{ - \pi t^{2} }}$ is the Gaussian window function, $\gamma = (s,u,\xi )$ are the parameters of the atom, $s$ is the scaling factor, $u$ is the translation factor, $\xi$ is the frequency factor. The function $g_{\gamma } (t)$ is centered at $u$ and its energy is mostly concentrated in a neighborhood of $u$ whose size is proportional to $s$.

Due to the good time-frequency resolution, the Gabor dictionary has been commonly used to analyze the EEG signal [9], audio signal [10] and so on. However, since the atoms of Gabor dictionary are frequency-fixed and divide the time-frequency plane by rectangular grid, it is weak in analyzing signals with frequency-converted components.

(b)
Chirplet dictionary

A Chirplet atom, which is built from the unit Gaussian window by dilation, translation, frequency and chirp modulation, is defined as

$$g_{\gamma } (t)=\frac{1}{\sqrt \sigma }g\left( {\frac{t - u}{\sigma }} \right)\exp \left\{ {j2\pi \left[ {1 + r \cdot \left( {\frac{t - u}{\sigma }} \right)} \right] \cdot f_{c} \cdot \left( {\frac{t - u}{\sigma }} \right)} \right\}$$

(11)

Equation (11) can also be written as

$$g_{\gamma } (t) = \frac{1}{\sqrt \sigma }g\left( {\frac{t - u}{\sigma }} \right)\exp \left\{ {j2\pi \left[ {f_{c} \cdot \left( {\frac{t - u}{\sigma }} \right) + \frac{1}{2}\xi \cdot \left( {\frac{t - u}{\sigma }} \right)^{2} } \right]} \right\}$$

(12)

where $\xi = 2rf_{c}$ is the linear Chirp rate, $\gamma = \left( {\sigma ,u,\xi ,c} \right)$ is the atom parameters set, $\sigma$ is the scale operator which controls the width of the function, $u$ is the time center of the Chirplet function, $\xi$ is the frequency-variant factor, $f_{c}$ is the frequency center of the Chirplet function.

Different parameter values denote different Chirplet functions, which compose the Chirplet dictionary. As its frequency can change linearly, the Chirplet dictionary are more likely to be used to describe the signal whose frequency linearly varies with the time, such as the radar signal and the sonar signal [11]. However, when dealing with the time varying components, Chirplet becomes less effective and even inaccurate.

(c)
FM ^m let dictionary

To characterize both the signal’s time-invariant and time-varying spectral contents, the dilated and translated windowed exponential frequency modulated functions (FM^mlet) is proposed by Zou et al. [12]. The FM^mlet atom is defined as

$$g_{\gamma } (t) = \frac{1}{\sqrt \sigma }g\left( {\frac{t - u}{\sigma }} \right)\exp \left\{ {j2\pi \left[ {1 + r\left( {\frac{t - u}{\sigma }} \right)} \right]^{m} f_{c} \left( {\frac{t - u}{\sigma }} \right)} \right\}$$

(13)

The atom is expressed by the following five parameters: scaling operator $\sigma$, time-center $u$, frequency center $f_{c}$, chirp rate $r$, and FM exponent $m$.

The FM^mlet dictionary is more flexible when dealing with the signal whose spectral contents vary nonlinearly with respect to time; therefor it has been used to process earthquake signal [13], ECG signal [14].

2.3 Solution to Sparse Representation Model

After constructing a suitable dictionary, the next important task is to develop a reliable and efficient algorithm to solve the sparse representation model. It is tough to find a straightforward approach to solve Eq. (9). A number of methods have been developed to solve such an equation in recent years. These methods can be classified into two main categories: the greedy algorithms and the convex relaxation techniques. The greedy algorithms include matching pursuit (MP) [15], orthogonal matching pursuit (OMP) [16], regularized orthogonal matching pursuit (ROMP) [17], etc., and the convex relaxation techniques include basis pursuit (BP) [18], basis pursuit denoising (BPD) [19], etc. In the following, we put an emphasis on the MP and BPD algorithm introduction.

(a)
Matching pursuit

Originated from Ref. [15], matching pursuit algorithm uses a greedy heuristic to iteratively construct a best decomposition of the original signal. The basic idea of matching pursuit algorithm is that it attempts to represent a signal $\varvec{x}$ from Hilbert space as a weighted sum of atoms $\varvec{\phi}_{\gamma i}$ taken from an over-complete dictionary $\varvec{\phi}$,

$$\varvec{x} = \sum\limits_{i = 1}^{m} {a_{\gamma i} }\varvec{\phi}_{\gamma i} + \varvec{r}_{m}$$

(14)

where $a$ is the weighting factor for each atom, $\gamma$ is the parameter of each atom, $\varvec{r}_{m}$ is the residual signal. $\varvec{r}_{m}$ can be obtained by

$$\varvec{r}_{i} = \varvec{r}_{i - 1} - a_{\gamma i}\varvec{\phi}_{\gamma i}$$

(15)

when $i = 1$, $\varvec{r}_{0} = \varvec{x}$, the weighting factor $a = \left\langle {\varvec{r}_{i - 1} ,\varvec{\phi}_{\gamma i} } \right\rangle$ is the inner product of the residual signal and the atom.

Given the fixed over-complete dictionary, the matching pursuit algorithm first finds the atom which has the biggest inner product with the signal, then subtracts the contribution made by that atom from the residual, and repeats the process until the stopping criterion is satisfied. The procedures of matching pursuit are illustrated in Table 1.

Table 1 Procedures of matching pursuit algorithm

Full size table

(b)
Basis pursuit denoising

Naturally, $l_{0}$-norm is used to measure the sparsity of the representation coefficients. However, $l_{0}$-norm minimization is a nondeterministic polynomial (NP) problem due to its nature of combinational optimization, which is too complex to solve. By replacing the $l_{0}$-norm in Eq. (9) with the $l_{1}$-norm, an approximate solution can be attained as follows

$$\mathop {\hbox{min} }\limits_{\varvec{c}} \left\| \varvec{c} \right\|_{1} \quad {\text{subject}}\,{\text{to}} \quad \left\| {\varvec{Ac} - \varvec{y}} \right\|_{2}^{2} \le \varepsilon$$

(16)

where $\left\| \cdot \right\|_{1}$ is the $l_{1}$-norm defined as $\left\| \varvec{c} \right\|_{1} = \sum\nolimits_{i} {\left| {c_{i} } \right|}$. The basis pursuit denoising can be defined using such an Eq. (16). Donoho [19] has proven that under certain conditions, i.e., the solution is sparse enough, the solution to Eq. (16) is equivalent to the Eq. (9).

Equation (16) can also be written as a more general version

$$J(\varvec{c}) = \arg \mathop {\hbox{min} }\limits_{\varvec{c}} \left\{ {\frac{1}{2}\left\| {\varvec{y} - \varvec{Ac}} \right\|_{2}^{2} + \lambda \left\| \varvec{c} \right\|_{1} } \right\}$$

(17)

where $\left\| \cdot \right\|_{2}^{2}$ is the $l_{2}$-norm defined as $\left\| \varvec{c} \right\|_{2}^{2} = \sum\nolimits_{i} {\left| {c_{i} } \right|}^{2}$, $\lambda$ is a scalar regularization parameter which balances the tradeoff between the reconstruction error and the sparsity.

There are two terms in the right side of Eq. (17): the data fidelity term $\left\| {\varvec{y} - \varvec{Ac}} \right\|_{2}^{2}$ and the penalty term $\left\| \varvec{c} \right\|_{1}$. To solve Eq. (17), one can optimize either the data fidelity term or the penalty term. The detailed optimization method will be introduced in Sect. 3.

3 Over-Complete Wavelet Basis Dictionary

3.1 General Over-Complete Wavelet Basis Dictionary

The key point of mechanical fault feature extraction is to construct an appropriate over-complete wavelet basis dictionary. As is well known, the more similarity between the wavelet basis and the fault signal, the sparser the representation coefficients are. According to the experience, the choice of wavelet basis may vary from one to another. Generally, the single-side wavelet is often used to construct the basis matrix for bearing fault signature extraction [20]; the double-side wavelets are usually used for faulty gear vibration signal processing [21]; and chirp signal is often used for radar signal analysis [22], etc. From the literature, the most widely used wavelet basis for mechanical fault feature extraction are more likely to be Laplace wavelet and the Morlet wavelet, which are introduced in detail in the following.

(a)
Laplace wavelet basis

The Laplace wavelet is a complex, analytic, single-sided damped exponential wavelet. It is firstly constructed by Strang G in 1996 [23]. Since its waveform is similar to the vibration impulse caused by bearing faults, Laplace wavelet is usually selected to construct the over-complete dictionary for bearing fault detection. As one of the most popular non-orthogonal wavelets, the real field of the Laplace wavelet is defined as

$$\psi \left( {f,\zeta ,\tau ,t} \right) = \psi_{\gamma } \left( t \right) = \left\{ {\begin{array}{*{20}l} {A{\text{e}}^{{\frac{ - \zeta }{{\sqrt {1 - \zeta^{2} } }}2\pi f\left( {t - \tau } \right)}} \sin 2\pi f\left( {t - \tau } \right)} \hfill & { \, t \in \left[ {\tau ,\tau + W_{s} } \right]} \hfill \\ 0 \hfill & {\text{else}} \hfill \\ \end{array} } \right.$$

(18)

where the parameter vector $\gamma = \left( {f,\zeta ,\tau } \right)$ determines the wavelet properties. These parameters $\left( {f,\zeta ,\tau } \right)$ denote frequency $f \in \varvec{R}^{ + }$, damping ratio $\zeta \in [0,1) \,\subset \, \user2{R}^{2}$, and time index $\tau \in \user2{R}$, respectively. The coefficient A is an arbitrary scaling factor used to scale each wavelet to unity norm. The range W _s ensures that the wavelet is compactly supported and has nonzero finite length, but the parameter W _s is generally not explicitly expressed.

(b)
Morlet wavelet basis

Morlet wavelet is one of the most popular non-orthogonal wavelets, defined in the time domain as a harmonic wave multiplied by a Gaussian time domain window

$$\psi \left( t \right) = \exp \left( { - \frac{{\beta^{2} t^{2} }}{2}} \right)\cos \left( {\pi t} \right)$$

(19)

It is a cosine signal that decays exponentially on both the left and right sides, which makes it very similar to an impulse caused by the gear localized defects at a constant speed in terms of shape. Therefore, Morlet wavelet is often selected to build the over-complete dictionary when extracting the gear fault feature. In order to reduce the complexity, the parametric formulation of Morlet wavelet is given

$$\psi \left( {f,\zeta ,\tau ,t} \right) = \psi_{\gamma } \left( t \right) = A{\text{e}}^{{\frac{ - \zeta }{{\sqrt {1 - \zeta^{2} } }}[2\pi f\left( {t - \tau } \right)]^{2} }} \cos 2\pi f\left( {t - \tau } \right) \,$$

(20)

where the parameter vector $\gamma = \left( {f,\zeta ,\tau } \right)$ also determines the wavelet properties. These parameters $\left( {f,\zeta ,\tau } \right)$ denote frequency $f \in \varvec{R}^{ + }$, damping ratio $\zeta \in \left[ {0,1} \right) \subset \varvec{R}^{ + }$, and time index $\tau \in \varvec{R}$, respectively. The parameter A is used to normalize the wavelet function.

Setting the discrete parameters $f$, $\zeta$ and $\tau$ as the subsets of F, Z and T _c respectively, there is

$$\begin{aligned} F & { = }\left\{ {f_{1} ,f_{2} , \ldots ,f_{i} } \right\} \subset R \\ Z & = \left\{ {\zeta_{1} ,\zeta_{2} , \ldots ,\zeta_{j} } \right\} \subset R^{ + } \cap \left[ {0,1} \right) \\ T_{c} & = \left\{ {\tau_{1} ,\tau_{2} , \ldots ,\tau_{k} } \right\} \subset R \\ \end{aligned}$$

(21)

With different parameters, dictionary can be constructed using the following equation

$$\Psi = \left\{ {\psi_{\gamma } \left( t \right){:} \, \gamma \in F \times Z \times T_{c} } \right\} = \left\{ {\psi \left( {f,\zeta ,\tau ,t} \right){:} \, f \in F,\zeta \in Z,\tau \in T_{c} } \right\}$$

(22)

Each item in the dictionary is called an atom. In this way, the over-complete dictionary has been constructed systematically.

3.2 Correlation Filtering

If the suitable wavelet basis (Laplace wavelet, Morlet wavelet or others) is already chosen, correlation filtering is applied to identify the optimal wavelet atom with the optimal set of parameters $\left( {\bar{f},\bar{\zeta },\bar{\tau }} \right)$, which is most similar to the transient impulses caused by a localized fault.

Correlation, measured by inner product operation, is defined to quantify the degree of similarity between the wavelet basis and the original signal. The correlation function $c_{\gamma }$ is defined to calculate the correlation degree between the basis $\psi_{\gamma } \left( t \right)$ and the original signal $x\left( t \right)$

$$c_{\gamma } = \cos \theta = \frac{{\left| {\left\langle {\psi_{\gamma } \left( t \right),x\left( t \right)} \right\rangle } \right|}}{{\left\| {\psi_{\gamma } \left( t \right)} \right\|^{2} \left\| {x\left( t \right)} \right\|^{2} }}$$

(23)

where $\theta$ is the angle between $\psi_{\gamma } \left( t \right)$ and $x\left( t \right)$. The smaller the angle is, the more similar the basis $\psi_{\gamma } \left( t \right)$ and the original signal is. Therefore, the optimal wavelet atom with optimal parameters $\left( {\bar{f},\bar{\zeta },\bar{\tau }} \right)$ can be obtained by maximizing the correlation function $c_{\gamma }$ at each time value from the constructed Laplace wavelet or Morlet wavelet dictionary. Peaks of $c_{\gamma }$ for a given time value $\tau$ can be represented as

$$k_{r} \left( \tau \right) = \mathop {\hbox{max} }\limits_{{f \in {\mathbf{F, }}\zeta \in {\mathbf{Z}}}} c_{\gamma } = c\left( {\bar{f},\bar{\zeta },\tau } \right)$$

(24)

and the time index parameter $\bar{\tau }$ can be calculated by maximizing the coefficient $k_{r} \left( \tau \right)$. With correlation filtering, the optimal parameters $\left( {\bar{f},\bar{\zeta },\bar{\tau }} \right)$ found effectively, the optimal wavelet atom with these parameters can be constructed.

4 Solution to Representation Coefficients Based on BPDN

4.1 Data Fidelity Optimization

To represent the fault transients by sparse coefficients, the Basis Pursuit Denoising defined by Eq. (17) should be solved. Only after the minimization of objective function in Eq. (17), a sparse representation vector c can be obtained. To minimize J(c), an iterative algorithm is introduced. The traditional gradient descent methods, such as iterative shrinkage/thresholding algorithm (ISTA) [24], fast IST algorithm (FISTA) [25] and so on, have the drawback of slow convergence. In order to improve the speed of convergence, Manya has proposed a novel technique termed the split augmented Lagrangian shrinkage algorithm (SALSA) using the Hessian of the data fidelity term [26]. The algorithm updates the vector c until the optimal solution ${\hat{\varvec{c}}}$ is gained, so as to minimize the objective function J(c).

Considering the unconstrained optimization problem in which the objective function is the summation of two functions, the Eq. (17) can be written as

$$\mathop {\hbox{min} }\limits_{\varvec{c}} \left\{ {f_{1} \left( \varvec{c} \right) + f_{2} \left( \varvec{c} \right)} \right\}$$

(25)

where $f_{1} \left( \varvec{c} \right) = \frac{1}{2}\left\| {\varvec{y} - \varvec{Ac}} \right\|_{2}^{2}$, $f_{2} \left( \varvec{c} \right) = \lambda \left\| \varvec{c} \right\|_{1}$. Then variable splitting is introduced to create a new variable denoted by u, to serve as the augment of $f_{1}$, under the constraint that u = c. This leads to the constrained problem

$$\mathop {\hbox{min} }\limits_{{\varvec{u},\varvec{c}}} \left\{ {f_{1} \left( \varvec{u} \right) + f_{2} (\varvec{c})} \right\} \quad {\text{s}}.{\text{t}}. \quad \varvec{u} = \varvec{c}$$

(26)

which is obviously equivalent to the unconstrained problem in Eq. (25). This problem can be represented as the so-called augmented Lagrangian problem

$$\mathop {\hbox{min} }\limits_{z} \, E\left( \varvec{z} \right)\quad {\text{ s}} . {\text{t}} .\quad \varvec{Hz} - \varvec{b} = {\mathbf{0}}$$

(27)

where $E\left( \varvec{z} \right) = f_{1} \left( \varvec{u} \right) + f_{2} \left( \varvec{c} \right)$, $\varvec{z} = \left[ \begin{aligned} \varvec{u} \hfill \\ \varvec{c} \hfill \\ \end{aligned} \right]$, $\varvec{b} = {\mathbf{0}}$, $\varvec{H} = [\varvec{I} \, - \varvec{I}]$. The augmented Lagrangian function for this problem is defined as

$$L\left( {\varvec{z},\lambda ,\mu } \right) = E\left( \varvec{z} \right) + \lambda^{T} (\varvec{Hz} - \varvec{b}) + \frac{\mu }{2}\left\| {\varvec{Hz} - \varvec{b}} \right\|_{2}^{2}$$

(28)

where $\lambda$ is a vector of Lagrange multipliers and $\mu \ge 0$ is the penalty parameter. The augmented Lagrangian method (ALM) is used to minimize the objective function $L\left( {\varvec{z},\lambda ,\mu } \right)$, the following results can be obtained

$$\varvec{z}^{(k + 1)} = \arg \mathop {\text{min}}\limits_{z} \left\{ {E\left( \varvec{z} \right) + \frac{\mu }{2}\left\| {\varvec{Hz} - \varvec{d}^{\left( k \right)} } \right\|_{2}^{2} } \right\}$$

(29)

$$\varvec{d}^{{\left( {k + 1} \right)}} = \varvec{d}^{(k)} - (\varvec{Hz}^{(k + 1)} - \varvec{b})$$

(30)

where k is the iteration counter. Considering the concrete forms of the function E(z), matrix H and the vector b, novel results can be written as

$$\varvec{u}^{(k + 1)} = \arg \mathop {\text{min}}\limits_{\varvec{u}} \left\{ {f_{1} \left( \varvec{u} \right) + \frac{\mu }{2}\left\| {\varvec{u} - \varvec{c}^{\left( k \right)} - \varvec{d}^{\left( k \right)} } \right\|_{2}^{2} } \right\} = \arg \mathop {\text{min}}\limits_{\varvec{u}} \left\{ {\frac{1}{2}\left\| {\varvec{y} - \varvec{Au}} \right\|_{2}^{2} + \frac{\mu }{2}\left\| {\varvec{u} - \varvec{c}^{\left( k \right)} - \varvec{d}^{\left( k \right)} } \right\|_{2}^{2} } \right\}$$

(31)

$$\varvec{c}^{(k + 1)} = \arg \mathop {\text{min}}\limits_{\varvec{c}} \left\{ {f_{2} \left( \varvec{c} \right) + \frac{\mu }{2}\left\| {\varvec{u}^{{\left( {k + 1} \right)}} - \varvec{c} - \varvec{d}^{\left( k \right)} } \right\|_{2}^{2} } \right\} = \arg \mathop {\text{min}}\limits_{c} \left\{ {\lambda \left\| \varvec{c} \right\|_{1} + \frac{\mu }{2}\left\| {\varvec{u}^{{\left( {k + 1} \right)}} - \varvec{c} - \varvec{d}^{\left( k \right)} } \right\|_{2}^{2} } \right\}$$

(32)

$$\varvec{d}^{{\left( {k + 1} \right)}} = \varvec{d}^{(k)} - (\varvec{u}^{(k + 1)} - \varvec{c}^{{\left( {k + 1} \right)}} )$$

(33)

Equation (31) is a strictly convex quadratic function to be minimized, which leads to the solution u ^(k+1) directly, and the soft threshold facilitates the minimization of Eq. (32), after which the iteration procedure of SALSA can be listed as

$$\varvec{u}^{{\left( {k + 1} \right)}} = \left( {\varvec{A}^{H} \varvec{A} + \mu \varvec{I}} \right)^{ - 1} \left( {\varvec{A}^{H} \varvec{y} + \mu (\varvec{c}^{k} + \varvec{d}^{k} )} \right)$$

(34)

$$\varvec{c}^{{\left( {k + 1} \right)}} = soft\left( {\varvec{u}^{{\left( {k + 1} \right)}} - \varvec{d}^{k} ,\frac{\lambda }{\mu }} \right)$$

(35)

$$\varvec{d}^{{\left( {k + 1} \right)}} = \varvec{d}^{\left( k \right)} - \varvec{u}^{{\left( {k + 1} \right)}} + \varvec{c}^{{\left( {k + 1} \right)}}$$

(36)

By the iterative numerical algorithm SALSA, the optimal sparse solution ${\hat{\varvec{c}}}$ can be obtained eventually. With the sparse solution ${\hat{\varvec{c}}}$, the reconstructed $\hat{\varvec{x}}$ can be represented as $\hat{\varvec{x}} = \varvec{A}{{\hat{\varvec{c}}}}$. There are successive periodic non-zero coefficients in ${\hat{\varvec{c}}}$, which represents the transients in the original signal.

4.2 Penalty Optimization

Unlike SALSA, Majorization Minimization (MM) algorithm mainly focuses on the penalty optimization to solve the Eq. (17). Based on non-quadratic majorization, the MM algorithm utilizes a sequence of simpler convex optimization problems to replace the original ill-posed inverse problems and yet is an effective and a widely applicable method [27].

The function J(c) can be easily minimized suppose it is quadratic. The MM algorithm utilizes this characteristic by solving a series of simpler minimization problems

$$c_{k + 1} = \arg \mathop {\hbox{min} }\limits_{c} G_{k} \left( c \right)$$

(37)

where k is the iteration counter, k = 1, 2, 3, …. The MM algorithm requires that each function $G_{k} \left( c \right)$ should be a majorizer (upper bound) of J(c) and it coincides with J(c) at $c = c_{k}$. That is

$$\begin{aligned} & \forall c,G_{k} \left( c \right) \ge J\left( c \right) \\ & G_{k} \left( {c_{k} } \right) = J\left( {c_{k} } \right) \\ \end{aligned}$$

(38)

The majorizer should be chosen so as to be easier to minimize. Considering the data fidelity term in the cost function (17) is strictly quadratic, we simply need to majorize the penalty term.

We mark the penalty $\left\| c \right\|_{1}$ as $\Psi \left( c \right)$, term $\left| c \right|$ as $\upphi \left( c \right)$. Hence, $Y\left( c \right) = \sum\nolimits_{n = 1}^{N} {\left| {c\left( n \right)} \right|} = \sum\nolimits_{n = 1}^{N} {f\left( {c\left( n \right)} \right)}$ . $\phi \left( c \right)$ is an absolute value function and thus is non-differentiable and non-strictly convex, which makes the Eq. (17) difficult to solve. According to the MM algorithm shown in Eq. (38), a quadratic function $g\left( c \right)$ can be found to majorize $\phi \left( c \right)$ of the general form

$$g\left( c \right) = mc^{2} + nc + b$$

(39)

where the parameters m, n and b are constants, the majorizer $g\left( c \right)$ should be the upper bound for $\phi \left( c \right)$ that coincides with $\phi \left( c \right)$ at a specified point $c_{k}$. For this quadratic majorizer, conditions in Eq. (38) are equivalent to

$$\begin{aligned} g\left( {c_{k} } \right) & = \phi \left( {c_{k} } \right) \\ g^{\prime } \left( {c_{k} } \right) & = \phi^{\prime } \left( {c_{k} } \right) \\ \end{aligned}$$

(40)

Solving for m and b gives $m = \left( {\phi^{\prime} \left( {c_{k} } \right)/2c_{k} } \right) - \left( {n/2c_{k} } \right)$, $b =\phi \left( {c_{k} } \right) - \left( {c_{k} /2} \right)\phi^{\prime} \left( {c_{k} } \right) - \left( {n/2} \right)c_{k}$, thus leading to the majorizer $g\left( c \right)$ in Eq. (39) given by

$$g\left( c \right) = \left( {\frac{{\phi^{\prime } \left( {c_{k} } \right)}}{{2c_{k} }} - \frac{n}{{2c_{k} }}} \right)c^{2} + nc + \left( {\phi \left( {c_{k} } \right) - \frac{{c_{k} }}{2}\phi^{\prime } \left( {c_{k} } \right) - \frac{n}{2}c_{k} } \right)$$

(41)

Considering a special condition of function $g\left( c \right)$, we set the unknown parameter n = 0; then the parameter m and b become $m = \left( {\phi^{\prime } \left( {c{}_{k}} \right)/2c{}_{k} } \right)$, $b = \phi \left( {c_{k} } \right) - \left( {c_{k} /2} \right)\phi^{\prime } \left( {c_{k} } \right)$, thus $g\left( c \right)$ turns out to be:

$$g\left( c \right) = \frac{{\phi^{\prime } \left( {c_{k} } \right)}}{{2c_{k} }}c^{2} + \phi \left( {c_{k} } \right) - \frac{{c_{k} }}{2}\phi^{\prime } \left( {c_{k} } \right)$$

(42)

Taking the concrete form of $\phi \left( c \right)$ into consideration, the function $g\left( c \right)$ can be written in a matrix format as:

$$G_{k} \left( c \right) = \frac{1}{2}\left\| {y - Ac} \right\|_{2}^{2} + \lambda \left( {\frac{1}{2}c^{*}\Lambda _{k}^{ - 1} c + \frac{1}{2}\left\| {c_{k} } \right\|_{1} } \right)$$

(43)

where $\Lambda _{k}$ denotes the diagonal matrix with vector $\left| {c_{k} } \right|$ along its diagonal. Then, the MM updates (37) for $c_{k}$ as:

$$c_{k + 1} = \arg \mathop {\hbox{min} }\limits_{c} \left[ {\frac{1}{2}\left\| {y - Ac} \right\|_{2}^{2} + \lambda \left( {\frac{1}{2}c^{*}\Lambda _{k}^{ - 1} c{ + }\frac{1}{2}\left\| {c_{k} } \right\|_{1} } \right)} \right]$$

(44)

The last term in Eq. (44) can be omitted because it does not depend on c; thus a new update equation also called cost function is transformed into:

$$c_{k + 1} = \arg \mathop {\hbox{min} }\limits_{c} \frac{1}{2}\left\| {y - Ac} \right\|_{2}^{2} + \frac{\lambda }{2}c^{*} {\Uplambda }_{k}^{ - 1} c$$

(45)

Equation (45) is quadratic in terms of c, so the solution to this problem can be written explicitly using linear algebra as:

$$c_{k + 1} = \left( {A^{*} A + \lambda {\Uplambda }_{k}^{ - 1} } \right)^{ - 1} A^{*} y$$

(46)

Taking the sparsity of c into consideration, the elements of ${\Uplambda }_{k}^{ - 1}$ would go towards infinity with the iterative procedure going on. To avoid this problem, the matrix inverse lemma is introduced. After that, the update equation can be expressed as:

$$c_{k + 1} = \frac{1}{\lambda }\Uplambda_{k} \left[ {A^{*} y - A^{*} \left( {A\Uplambda_{k} A^{*} + \lambda I} \right)^{ - 1} A\Uplambda_{k} A^{*} y} \right]$$

(47)

By the iterative procedure in Eq. (47) of MM algorithm, the optimal sparse solution ${\hat{\varvec{c}}}$, which is used to represent the fault feature, can be found. With the sparse solution ${\hat{\varvec{c}}}$, the reconstructed signal can be represented as $\hat{x} = \varvec{A} \hat {\varvec{c}}$.

5 Applications

5.1 Application in Gearbox Transient Feature Extraction

To verify the effectiveness of the proposed methods for the gearbox fault diagnosis, the experimental data were acquired from an automobile transmission gearbox which has five forward speeds and one backward speed. The structure of the gearbox is shown in Fig. 1. During the test, a broken-tooth fault occurred on the driving gear of the third speed. The vibration signal was acquired by an accelerometer mounted on the outer case of the gearbox when it was loaded with the third speed gearbox.

For a gear transmission, the meshing frequency $f_{m}$ is calculated by

$$f_{m} = \frac{nz}{60i}$$

(48)

where $z$ is the number of the gear teeth, $n$ is the rotating speed of the input shaft in rpm, and $i$ is the transmission ratio. In this test, n was set as 1600 ± 16 rpm, generally we use 1600 rpm as the speed of the input shaft. Then the meshing frequency of the third speed is calculated to be 500 Hz. The sampling frequency was set as 3000 Hz. The working parameters are shown in Table 2.

Table 2 Working parameters of the third speed gears

Full size table

(a)
Sparsity - based fault feature extraction by optimizing data fidelity term

A measured vibration signal with a length of 900 samples and its Fourier spectrum are shown in Fig. 2a, b. The fault feature of the gearbox vibration signal cannot be identified from Fig. 2a. From Fig. 2b, the main frequency component can be identified as 500 Hz, which is in fact the meshing frequency.

Considering the Morlet wavelet is similar to the impulse caused by the gear localized defect, it is selected as the atom to construct the over-complete dictionary. After constructing the dictionary, the sparse representation model can be established. Then the SALSA algorithm can be applied to solve the sparse representation model. Figure 3 presents the analysis result of the vibration signal obtained by the proposed method. Figure 3a displays the optimal wavelet atom based on correlation filtering. The related parameters are $\bar{f} = 272$ Hz, $\bar{\zeta } = 0.0074$ and $\bar{\tau } = 0.0633$ s. The first N elements of the representation coefficient vector ${\hat{\mathbf{c}}}$ are given in Fig. 3b. $3\sigma$ is used as the threshold to filter away the small values to extract the principle components, and then the final estimated vector $\hat{\mathbf{c}}^{\prime}$ is illustrated in Fig. 3c. The cyclic period $\hat{T} = 50.00$ ms can be easily identified in Fig. 3c, which is consistent with the theoretical value $T = 50.00$ ms. The impulse time can also be identified from Fig. 3c. The parameter values used in this case are listed in Table 3.

Table 3 Conclusion of all parameters of transient components in the vibration signal of faulty gearbox

Full size table

The removed small coefficients representing the meshing frequency are shown in Fig. 3d. The meshing period $T_{0} = 0.002$ s can be observed in Fig. 3d, indicating that the meshing frequency is 500 Hz. This is consistent with the theoretical meshing frequency 500 Hz. As a result, it is proved that the proposed method is effective in identifying the impulse occurrence time and the period parameter.

A comparison between the reconstructed impulse responses and the original signal is presented in Fig. 4. The interval period between the transients is consistent with the rotating period of the third speed gears. Hence, it indicates that there is a localized fault in the third speed gear of the gearbox. After overhaul, it has been found that the driving gear of the third speed is broken.

(b)
Sparsity - based fault feature extraction by optimizing penalty term

Another vibration signal was measured on the same gearbox with a length of 900 and its frequency spectrum is shown in Fig. 5a, b. From Fig. 5a, the impulse period cannot be identified because of the noise corruption; from Fig. 5b, the frequency of the main component can be identified as 500 Hz.

Firstly, the optimal Morlet wavelet atom is obtained by using the correlation filtering to construct the over-complete dictionary. The sparse representation model can subsequently be built. Then the MM algorithm is applied to solve the sparse representation model by optimizing the penalty term. The associated parameters of the optimal wavelet atom are $\bar{f} = 272$ Hz, $\bar{\zeta } = 0.0074$ and $\bar{\tau } = 0.0633$ s. Figure 6a, obtained by the proposed method, shows the sparse coefficients, which represents a series of periodic impulses. The average time period of these impulses is around 0.0505 s, which is very close to the theoretical value 0.050 s. Figure 6b illustrates the reconstructed signal, whose periodical features represent the localized fault existing in the driving gear of the third speed. The analysis results demonstrate that the proposed transient sparse representation method can extract the transients and reduce the noise effectively, thus the machinery condition can be identified.

5.2 Application in Bearing Transient Feature Extraction

To verify the effectiveness of the proposed method for bearing fault diagnosis, the experimental data were acquired from a test rig, which is shown in Fig. 7. The vibration signal is measured from a rotating machine test rig with the sampling frequency 51.2 kHz. The test rig consists of a driving motor and a shaft, which is driven by the motor and supported by two bearing blocks. The bearing used in this test is NJ208 (TMB) cylindrical roller bearing. Details of the geometry and fault frequencies of this type of bearings can be found in Table 4. In this test, the fault frequency of the outer race, the inner race and the rolling element are 142.8 Hz (7.003 ms), 206.3 Hz (4.847 ms) and 132.6 Hz (7.541 ms), respectively, when the shaft rotates at 1496 RPM.

Table 4 The geometry and fault frequencies of bearings

Full size table

(c)
Sparsity - based fault feature extraction by optimizing data fidelity term

The measured outer race fault vibration signal with 4096 samples and its Fourier spectrum are shown in Fig. 8a, b. The fault feature cannot be seen from Fig. 8. Considering the Laplace wavelet is morphologically similar to the impulse caused by the localized defect in rolling bearing, it is selected as the atom to construct the over-complete dictionary. The sparse representation model can then be founded. To solving the sparse representation model, the analysis results in Fig. 9 can be obtained by applying the SALSA algorithm. Figure 9a shows the optimal wavelet atom by using correlation filtering. The related parameters are $\bar{f} = 3024$ Hz, $\bar{\zeta } = 0.0890$, $\bar{\tau } = 0.0159$ s. The first N elements of the representation coefficient vector ${\hat{\mathbf{c}}}$ are given in Fig. 9b. The cyclic period $\hat{T} = 7.01$ ms can be identified in Fig. 9c, which is consistent with the theoretical value $T = 7.00$ ms. The occurrence time of impulse can also be identified in Fig. 9c.

The measured inner race fault vibration signal and its Fourier spectrum are shown in Fig. 10a, b. The fault feature cannot be identified from Fig. 10.

Figure 11 exhibits the analysis result of the inner race vibration signal obtained by the proposed method. The optimal wavelet atom obtained by using correlation filtering is shown in Fig. 11a. The related parameters are $\bar{f} = 6402$ Hz, $\bar{\zeta } = 0.1400$, $\bar{\tau } = 0.0467$ s. The first N elements of the representation coefficient vector ${\hat{\mathbf{c}}}$ are presented in Fig. 11b. The cyclic period $\hat{T} = 4.85$ ms can be clearly identified in Fig. 11c, which is consistent with the theoretical value $T = 4.85$ ms.

The measured rolling element fault vibration signal and its Fourier spectrum are shown in Fig. 12a, b.

Figure 13 gives the analysis result of the rolling element vibration signal obtained by the proposed method. The optimal wavelet atom obtained by using correlation filtering is shown in Fig. 13a. The related parameters are $\bar{f} = 3024$ Hz, $\bar{\zeta } = 0.089$, $\bar{\tau } = 0.0159$ s. The first N elements of the representation coefficient vector ${\hat{\mathbf{c}}}$ are displayed in Fig. 13b. The cyclic period $\hat{T} = 7.47$ ms can be identified in Fig. 13c, which is consistent with the theoretical value $T = 7.54$ ms.

(d)
Sparsity - based fault feature extraction by optimizing penalty term

Another group of fault signals with a length of 5120 samples is also measured from the same test rig as Fig. 7. The over-complete dictionary is constructed using the Laplace wavelet, from which a signal sparse representation model can be established. Then the MM algorithm is applied to solve the model to obtain the representation coefficients. The measured outer race fault vibration signal and its Fourier spectrum are shown in Fig. 14a, b. No information related to bearing defects can be recognized.

Figure 15 gives the analysis result of the vibration signal obtained by the proposed method. The representation coefficient vector ${\hat{\mathbf{c}}}$ is given in Fig. 15a, in which the cyclic period $\hat{T} = 7.02$ ms can be discerned. The reconstructed signal is shown in Fig. 15b.

The measured inner race fault vibration signal and its Fourier spectrum are shown in Fig. 16a, b, which cannot easily determine the bearing health condition.

Figure 17 shows the analysis result of the vibration signal obtained by the proposed method. The representation coefficient vector ${\hat{\mathbf{c}}}$ is given in Fig. 17a, in which the cyclic period $\hat{T} = 4.84$ ms can be discerned. The reconstructed signal is shown in Fig. 17b, yielding the easily-observed impulses.

The measured rolling element fault vibration signal and its Fourier spectrum are shown in Fig. 18a, b. No frequency information associated with rolling elements can be recognized in Fig. 18b.

Figure 19 exhibits the analysis result of the vibration signal obtained by the proposed method. The representation coefficient vector ${\hat{\mathbf{c}}}$ is shown in Fig. 19a, where the cyclic period $\hat{T} = 7.51$ ms can be identified. The reconstructed signal shown in Fig. 19b clearly shows the cyclic impulses generated by rolling element defect.

5.3 Application in Compound Fault Feature Extraction

Apart from single fault detection of rotating machinery, compound fault diagnosis also has been gaining more attention in recent years. Taking the compound fault in the gearbox as an example, this section applies the sparse representation method to separating and extracting the compound fault features of gearbox.

The test rig, which is a single stage transmission gearbox in a test-bed, is shown in Fig. 20. There are both bearing and gear faults in the gearbox, which is shown in Fig. 21, respectively. The faulty gear is a helical, whose parameters are listed in Table 5. The bearing model in the experiment is 30,205, taper roller bearing, and its geometric parameters are listed in Table 6. With the known parameters, the theoretical fault feature frequency of the bearing can be calculated as 176.18 Hz.

Table 5 Working parameters of gears in the tested gearbox

Full size table

Table 6 Geometry of the tested bearing

Full size table

The measured vibration signal with compound faults is shown in Fig. 22, from which the characteristics of each fault cannot be identified clearly. Thus, the sparse representation method is applied to extracting the fault features one by one. In terms of the sequence of the compound fault feature extraction, we take the influence of propagation path of signals into consideration. As the sensor is placed on the bearing end cover, which is closer to the faulty bearing, it is desirable to extract the bearing fault feature at first. Firstly, the iterative algorithm SALSA is selected as the optimization algorithm. Then, the optimal Laplace wavelet, which is effective in bearing fault induced impulse representation and determined using the correlation filtering, is chosen to construct the over-complete dictionary A ₁ based on the explanation in Sect. 2. The selected Laplace wavelet is shown in Fig. 23a. Incorporating the dictionary A ₁ into the iterative procedure of SALSA, the sparse coefficients ${\hat{\varvec{c}}}_{{\mathbf{1}}}$ of bearing fault can be obtained in Fig. 23b. The corresponding reconstructed signal illustrated in Fig. 23c can also be obtained by the equation $\hat{x}_{1} = \varvec{A} \hat{\varvec{c}}_{1}$. To acquire the fault characteristics, the envelope spectrum analysis of the reconstructed signal is performed, yielding the result in Fig. 23d. The characteristic frequency of the faulty bearing, 174.1 Hz, can be easily recognized, which is almost identical to the theoretical value 176.18 Hz. Therefore, it can be concluded that the bearing is defective.

As we know, the amplitude of each transient impulse caused by localized bearing fault is represented by the sparse vector ${\hat{\varvec{c}}}_{{\mathbf{1}}}$. In order to estimate the real amplitude of bearing fault transients, a constrained optimization strategy is proposed is proposed to estimate the amplitude of each single fault component by introducing the parameter k. The spectrum of the residual fault signal $x - k\hat{x}_{1}$ is denoted by $F_{1} \left( f \right)$

$$\begin{aligned} & \hbox{min} \left\{ {F_{1} \left( f \right)} \right\} \\ & {\text{subject to }}k > 0,f = f_{z1} \, \\ \end{aligned}$$

(49)

where $x$ is the original measured signal, $f_{z1}$ is the peak frequency and k is a positive parameter. When $F_{1} \left( f \right)$ is minimized subject to its constraints, it indicates that the bearing fault component in the residual fault signal has been removed to the largest extent. By solving problem in (48), an optimal value $k_{opt}$ is acquired and the estimated bearing fault signal can be obtained by the function $x_{1} = k_{opt} \hat{x}_{1}$. Based on the above description, we can draw that Fig. 23e shows the estimated bearing fault component with $k_{opt} = 1.332$.

Removing the estimated bearing fault signal from the original signal, the residual signal is shown in Fig. 24. Similar to the bearing fault feature extraction, the SALSA is firstly chosen as the optimization algorithm. Then, the optimal Morlet wavelet basis A2 is obtained by correlation filtering, as presented in Fig. 25a. With the constructed dictionary A2, the iterative procedure can be implemented to gain the sparse vector $\hat{c}_{2}$ representing the gear fault feature, as shown in Fig. 25b. Its reconstructed signal is obtained in Fig. 25c. The envelope spectrum analysis of the reconstructed signal is illustrated in Fig. 25d, from which the fault characteristic frequency of gear can be identified, 25.6 Hz, close to the theoretical value 24.67 Hz. The analysis indicates that there is a gear localized fault in the tested gearbox.

6 Discussions

In this chapter, a new transient extraction technique is introduced based on the sparse representation. To be more specific, the sparse representation model and over-complete dictionary are first constructed, and then the model can be solved by optimizing either the data fidelity term or the penalty term. Both are effective in extracting the transients and identifying the periodic parameters. The effectiveness has been demonstrated by the experimental applications. However, some issues about the proposed method still remain to be discussed.

(1)
In this chapter, the l ₁-norm is used to replace the l ₀-norm in the sparse representation model. Another available sparsity measurement method is to use the l _p-norm, leading to the following equation:
$$\mathop {\hbox{min} }\limits_{{\varvec{c}}} \left\| \varvec{c} \right\|_{p}^{p} \quad {\text{s.t}}. \quad \left\| {\varvec{Ac} - \varvec{y}} \right\|_{2}^{2} \le \varepsilon$$
(50)

Choosing p < 1 will lead to a sparse solution; however, it will also lead to a non-convex optimization problem. Thus we can use $J\left({\varvec{c}} \right) = \sum\nolimits_{i} {\rho \left({c_{i} } \right)}$ to replace the lp-norm. Actually, any function $J\left({\varvec{c}} \right) = \sum\nolimits_{i} {\rho \left({c_{i}} \right)}$ with $\rho \left( {c_{i} } \right)$ being symmetric, monotonically non-decreasing, and with a monotonic non-increasing derivative for $c \ge 0$ will lead to the sparsity [7].
(2)
Selection of the wavelet basis is one of the key issues for the proposed method due to its influences on the sparsity of the coefficient vector c. With the increase of the noise amplitudes, the correlation values decrease sharply and thus leading to an error between the estimated value and the theoretical one. Besides, the empirical knowledge about the gearbox fault and bearing fault is used to construct the over-complete dictionary. Therefore, if the dictionary can learn from the measured signal by adding some rotating component fault features, the algorithms in this chapter will be more powerful in mechanical fault diagnosis.
(3)
The strategy of optimal wavelet atom determination and the algorithms of solving the sparse representation model are also vital for a successful sparse representation application to machinery fault feature extraction.
- This chapter employs the correlation filtering for the optimal wavelet atom selection. The disadvantage is that larger interval range and smaller step of the parameter subset Ψ, which can increase the accuracy of the result though, would incur excessive computation, thereby decreasing the efficiency of the method. Therefore, the strategy of optimal wavelet basis selection should be further exploited to ensure not only the computational efficiency but also estimation accuracy.
- This chapter utilizes the SALSA and MM algorithm to optimize the BPD problem. However, the more straightforward and simpler, yet effective, algorithms have not been largely explored for the solution of sparse representation model.

References

Bruckstein A.M., Donoho D.L., Elad M., “From sparse solutions of systems of equations to sparse modeling of signals and images,” SIAM Review, 2009, 51(1): 34–81.
Google Scholar
Yang J., Wright J., Huang T.S., et al., “Image super-resolution via sparse representation,” IEEE Transactions on Image Processing, 2010, 19(11): 2861–2873.
Google Scholar
Wright J., Ma Y., Mairal J., et al., “Sparse representation for computer vision and pattern recognition,” Proceedings of the IEEE, 2010, 98(6): 1031–1044.
Google Scholar
Liu H., Liu C., Huang Y., “Adaptive feature extraction using sparse coding for machinery fault diagnosis,” Mechanical Systems and Signal Processing, 2011, 25(2): 558–574.
Google Scholar
Peng F., Yu D., Luo J., “Sparse signal decomposition method based on multi-scale Chirplet and its application to the fault diagnosis of gearboxes,” Mechanical Systems and Signal Processing, 2011, 25(2): 549–557.
Google Scholar
Yang H., Mathew J., Ma L., “Fault diagnosis of rolling element bearings using basis pursuit,” Mechanical Systems and Signal Processing, 2005, 19(2): 341–356.
Google Scholar
Elad M., Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing, Springer, Israel, 2010.
Google Scholar
Rubinstein R., Bruckstein A.M., Elad M., “Dictionaries for sparse representation modeling,” Proceedings of the IEEE, 2010, 98(6): 1045–1057.
Google Scholar
Aviyente S., “Compressed sensing framework for EEG compression,” IEEE/SP 14th Workshop on Statistical Signal Processing, 2007: 181–184.
Google Scholar
Ghoraani B., Krishnan S., “Time–frequency matrix feature extraction and classification of environmental audio signals,” IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(7): 2197–2209.
Google Scholar
Yeste-Ojeda O.A., Grajal J., López-Risue G., “Atomic decomposition for radar applications,” IEEE Transactions on Aerospace and Electronic Systems, 2008, 44(1): 187–200.
Google Scholar
Zou H., Dai Q., Wang R., et al., “Parametric TFR via windowed exponential frequency modulated atoms,” IEEE Signal Processing Letters, 2001, 8(5): 140–142.
Google Scholar
Li X., Zhao K., Liu D., et al., “Feature extraction and identification of underground nuclear explosion and natural earthquake based on FMmlet transform and BP neural network,” Advances in Neural Networks-ISNN 2004. Springer Berlin Heidelberg, 2004: 925–930.
Google Scholar
Meng Q.J., Sun N., “A comparison study on Gabor, Chirplet, FMm let atom databases for ECG signal processing,” 3rd IEEE International Conference on Bioinformatics and Biomedical Engineering, 2009: 1–3.
Google Scholar
Mallat S.G., Zhang Z., “Matching pursuits with time-frequency dictionaries,” IEEE Transactions on Signal Processing, 1993, 41(12): 3397–3415.
Google Scholar
Donoho D.L., Tsaig Y., Drori I., et al., “Sparse solution of underdetermined systems of linear equations by stagewise orthogonal matching pursuit,” IEEE Transactions on Information Theory, 2012, 58(2): 1094–1121.
Google Scholar
Needell, D., Vershynin, R.: “Uniform uncertainty principle and signal recovery via regularized orthogonal matching pursuit,” Found. Comput. Math., 2009, 9(3):317–334.
Google Scholar
Chen, S.S., Donoho, D.L., Saunders, M.A., “Atomic decomposition by basis pursuit,” SIAM J. Sci. Comput., 1998, 20(1), 33–61.
Google Scholar
Donoho D.L., Huo X., “Uncertainty principles and ideal atomic decomposition,” IEEE Transactions on Information Theory, 2001, 47(7): 2845–2862.
Google Scholar
Feng K., Jiang Z., He W., et al., “Rolling element bearing fault detection based on optimal antisymmetric real Laplace wavelet,” Measurement, 2011, 44(9): 1582–1591.
Google Scholar
Zheng H., Li Z., Chen X., “Gear fault diagnosis based on continuous wavelet transform,” Mechanical Systems and Signal Processing, 2002, 16(2): 447–457.
Google Scholar
Droitcour A., Boric-Lubecke O., Lubecke V., et al., “Range correlation and I/Q performance benefits in single-chip silicon Doppler radars for noncontact cardiopulmonary monitoring,” IEEE Transactions on Microwave Theory and Techniques, 2004, 52(3):838–848,
Google Scholar
Strang G., Nguyen T., Wavelets and Filter Banks, SIAM, 1996.
Google Scholar
Daubechies I., Defrise M., De Mol C., “An iterative thresholding algorithm for linear inverse problems with a sparsity constraint,” Commun Pur Appl Math, 2003, 57: 1413–1457.
Google Scholar
Beck A., Teboulle M., “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM Journal on Imaging Sciences, 2009, 2(1): 183–202.
Google Scholar
Afonso M.V., Bioucas-Dias J.M., Figureueiredo M.A.T., “Fast image recovery using variable splitting and constrained optimization,” IEEE Transactions on Image Processing, 2010, 19(9): 2345–2356.
Google Scholar
Hunter D.R., Lange K., “A tutorial on MM algorithms,” The American Statistician, 2004, 58(1): 30–37.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Urban Rail Transportation, Soochow University, Suzhou, People’s Republic of China
Zhongkui Zhu, Wei Fan, Gaigai Cai, Weiguo Huang & Juanjuan Shi

Authors

Zhongkui Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Wei Fan
View author publications
You can also search for this author in PubMed Google Scholar
Gaigai Cai
View author publications
You can also search for this author in PubMed Google Scholar
Weiguo Huang
View author publications
You can also search for this author in PubMed Google Scholar
Juanjuan Shi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhongkui Zhu .

Editor information

Editors and Affiliations

School of Instrument Science and Engineering, Southeast University, Nanjing, China
Ruqiang Yan
School of Mechanical Engineering, Xi’an Jiaotong University, Xi’an, China
Xuefeng Chen
Department of Engineering, Macquarie University, Sydney, New South Wales, Australia
Subhas Chandra Mukhopadhyay

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zhu, Z., Fan, W., Cai, G., Huang, W., Shi, J. (2017). Sparse Representation of the Transients in Mechanical Signals. In: Yan, R., Chen, X., Mukhopadhyay, S. (eds) Structural Health Monitoring. Smart Sensors, Measurement and Instrumentation, vol 26. Springer, Cham. https://doi.org/10.1007/978-3-319-56126-4_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-56126-4_9
Published: 30 April 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56125-7
Online ISBN: 978-3-319-56126-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Sparse Representation of the Transients in Mechanical Signals

Abstract

Similar content being viewed by others

A bearing fault diagnosis method based on sparse decomposition theory

High Resolution Time-Frequency Distribution Based on Short-Time Sparse Representation

Accurate Spectral Estimation of Non-periodic Signals Based on Compressive Sensing

Keywords

1 Introduction

2 Sparse Representation Theory

2.1 Sparse Representation Model

2.2 Construction of the Over-Complete Dictionary

2.3 Solution to Sparse Representation Model

3 Over-Complete Wavelet Basis Dictionary

3.1 General Over-Complete Wavelet Basis Dictionary

3.2 Correlation Filtering

4 Solution to Representation Coefficients Based on BPDN

4.1 Data Fidelity Optimization

4.2 Penalty Optimization

5 Applications

5.1 Application in Gearbox Transient Feature Extraction

5.2 Application in Bearing Transient Feature Extraction

5.3 Application in Compound Fault Feature Extraction

6 Discussions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Sparse Representation of the Transients in Mechanical Signals

Abstract

Similar content being viewed by others

A bearing fault diagnosis method based on sparse decomposition theory

High Resolution Time-Frequency Distribution Based on Short-Time Sparse Representation

Accurate Spectral Estimation of Non-periodic Signals Based on Compressive Sensing

Keywords

1 Introduction

2 Sparse Representation Theory

2.1 Sparse Representation Model

2.2 Construction of the Over-Complete Dictionary

2.3 Solution to Sparse Representation Model

3 Over-Complete Wavelet Basis Dictionary

3.1 General Over-Complete Wavelet Basis Dictionary

3.2 Correlation Filtering

4 Solution to Representation Coefficients Based on BPDN

4.1 Data Fidelity Optimization

4.2 Penalty Optimization

5 Applications

5.1 Application in Gearbox Transient Feature Extraction

5.2 Application in Bearing Transient Feature Extraction

5.3 Application in Compound Fault Feature Extraction

6 Discussions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation