1 Introduction

As an important electrical component in subway train, auxiliary inverter takes the task of providing power to the electrical equipments apart from train traction. Therefore, the power quality and working stability of auxiliary inverter will directly affect the running efficiency and the safety of subway train. Fault signals in auxiliary inverter usually have the characteristics of nonlinearity and non-stationarity. As a consequence, the accurate fault diagnosis of auxiliary inverter is of great significance to guarantee the safe operation of subway train.

Wavelet analysis is applicable to the processing of non-stationary signal, while it can achieve the local analysis of time domain and frequency domain [1, 2], and conduct refined and multi-level analysis of signal by scale and shift operation. Wavelet package [3] is an orthogonal decomposition method on the basis of the multi-solution analysis of wavelet transform, and it can conduct more elaborate decomposition and reconstruction to the signal. The paper takes the frequency band energy extracted by wavelet package to compose the feature vector, which has the shortcomings of high dimension and serious self-correlation among data. Principal component analysis (PCA) [4] is a data mining technique of multivariate statistics, and it can reduce the dimension of data which contain a large number of relevant information. After the dimension reduction processing, the new data save the main information of original data. Therefore, PCA is widely applied in signal processing and artificial neural network. Artificial neural network has the unique capabilities of self-learning, self-organization, associative memory and parallel processing [5]. Wavelet neural network is a feed-forward network which takes the wavelet basis function as the activation function [6]. It is widely used in fault diagnosis and has achieved better results in actual application.

This paper consists of five sections. Initial fault feature extraction by wavelet package and the dimension reduction of extracted feature vectors by PCA are described in next section. Basic theory of wavelet neural network is given in Sect. 14.3. Simulation results and conclusion are presented in Sects. 14.4 and 14.5.

2 Fault Feature Extraction

Fault diagnosis belongs to the field of pattern recognition, and the feature extraction is an important part of diagnosis.

2.1 Initial Feature Vector Extracted by Wavelet Package

Wavelet package can further decompose the high frequency parts, select corresponding frequency band to match the signal spectrum adaptively to improve the time–frequency resolution. Therefore, wavelet package is the deeper analysis of multi-resolution and will have stronger description of fault signal.

For the fault diagnosis of auxiliary inverter, the paper extracts the frequency band energy as the initial feature vector and the extraction steps are as follows.

  1. 1.

    Select appropriate scale to decompose fault signal. The paper decomposes the fault signal at the level 3, and extracts the coefficients of eight frequency bands from low-frequency to high-frequency of the 3rd level respectively.

  2. 2.

    Reconstruct the decomposed coefficients of wavelet package, and extract the signal of each frequency band. S ij represents reconstructed signal at the ith layer of the jth node.

  3. 3.

    Extract the energy feature based on the reconstructed signal S 3k .

$$ E{}_{3k}\; = \int {\left| {S{}_{3k}\left( t \right)} \right|^{2} } dt = \sum\limits_{i = 0}^{N - 1} {\left| {d_{ki} } \right|}^{2} ,\quad k = 0,1, \ldots ,7 $$
(14.1)

where i is sample point, d ki is the discrete point amplitude of reconstructed signal S 3k N is the total number of sample point.

  1. 4.

    Calculate the whole energy E and compose the normalized feature vector T.

$$ E = \sum\limits_{j = 0}^{7} {E_{3j} } $$
(14.2)
$$ T = \;\left[ {{{E_{30} } \mathord{\left/ {\vphantom {{E_{30} } E}} \right. \kern-0pt} E},\;{{E_{31} } \mathord{\left/ {\vphantom {{E_{31} } E}} \right. \kern-0pt} E}, \ldots ,\;{{E_{37} } \mathord{\left/ {\vphantom {{E_{37} } E}} \right. \kern-0pt} E}} \right] $$
(14.3)

2.2 Feature Extraction Based on PCA

PCA can map the data from the original high-dimension space to a low-dimension vector space [7] and achieve the description of sample data by less feature values, so as to reduce the dimension, eliminate the overlap and redundancy of sample. Following steps achieve the dimension reduction and the final feature extraction.

  1. 1.

    Normalization of initial feature sample. Suppose \( X = \left( {x_{ij} } \right)_{n \times p} \) is the initial feature sample, n is the number of feature vector, p is the number of feature parameters. To avoid the network saturation caused by the excessive input sample, normalize the initial feature sample and obtain the processed sample \( X^{*} . \)

$$ x_{ij}^{*} = {{\left( {x_{ij} - \bar{x}_{j} } \right)} \mathord{\left/ {\vphantom {{\left( {x_{ij} - \bar{x}_{j} } \right)} {S_{j} }}} \right. \kern-0pt} {S_{j} }} $$
(14.4)

where \( x_{ij}^{*} \) is the normalized data, \( \bar{x}_{j} \) is the average value of sample, S j is the standard deviation.

  1. 2.

    Calculate correlative coefficient matrix. Obtain the coefficient matrix \( R = \left( {r_{ij} } \right)_{p \times p} \) based on the normalized sample \( X^{*} . \)

$$ r_{ij} = \frac{1}{n}\sum\limits_{k = 1}^{n} {x_{ki}^{*} \times x_{kj}^{*} } $$
(14.5)

where \( x_{ki}^{*} \) and \( x_{kj}^{*} \) are the ith row and the jth column of \( X^{*} \) respectively.

  1. 3.

    Calculate the feature values and feature vectors of correlative coefficient matrix. \( \lambda_{1} \ge \lambda_{2} \ge \cdots \ge \lambda_{p} > 0 \) are the feature values, with the corresponding vectors \( C = \left[ {c_{1} ,c_{2} , \cdots ,c_{p} } \right]. \) The new feature sample is \( Y_{n \times p} = CX^{*} . \)

  2. 4.

    Calculate the contribution rate and the cumulative contribution rate of the main components. Feature value reflects the deviation degree of corresponding component in sample. When the cumulative contribution rate of the former m components reaches 95 %, selects corresponding components to constitute the new feature sample.

$$ \eta_{i} = {{\lambda_{i} } \mathord{\left/ {\vphantom {{\lambda_{i} } {\sum\limits_{k = 1}^{p} {\lambda_{k} } }}} \right. \kern-0pt} {\sum\limits_{k = 1}^{p} {\lambda_{k} } }}\quad \left( {i = 1,2, \cdots ,p} \right) $$
(14.6)
$$ \varphi \left( \eta \right) = {{\sum\limits_{j = 1}^{m} {\eta_{j} } } \mathord{\left/ {\vphantom {{\sum\limits_{j = 1}^{m} {\eta_{j} } } {\sum\limits_{i = 1}^{p} {\eta_{i} } }}} \right. \kern-0pt} {\sum\limits_{i = 1}^{p} {\eta_{i} } }} $$
(14.7)

where \( \eta_{i} \) is the contribution rate of the ith component, \( \varphi \left( \eta \right) \) is the cumulative contribution rate of the former m components, \( m < p. \)

3 Basic Theory of Wavelet Neural Network

Wavelet neural network is the organic combination of wavelet theory and artificial neural network [8], while it selects nonlinear wavelet basis function to replace the activation function Sigmoid of hidden layer and establishes the connection of wavelet transform and network parameters by affine transform. Wavelet neural network takes the advantage of multi-scale time–frequency analyzing capability of wavelet transform and the generalized ability of neural network, so it has stronger nonlinear approximation, fault-tolerance and pattern classification capabilities.

Because the three-layer forward neural network can approximate a nonlinear mapping at arbitrary precision, wavelet neural network selects the three-layer network with single hidden layer.

Suppose the input layer has N neurons, with hidden layer neurons H and output layer neurons M. Input sample is \( X = \left( {x_{1} ,x_{2} , \cdots ,x_{N} } \right)^{T} , \) actual network output is \( \bar{Y} = \left( {\bar{y}_{1} ,\bar{y}_{2} , \cdots ,\bar{y}_{M} } \right)^{T} , \) the desired output is \( Y = \left( {y_{1} ,y_{2} , \cdots ,y_{M} } \right)^{T} . \)

Activation function of hidden layer selects Morlet basis function:

$$ \varphi \left( x \right) = \cos \left( {0.75x} \right)\exp \left( {{{ - x^{2} } \mathord{\left/ {\vphantom {{ - x^{2} } 2}} \right. \kern-0pt} 2}} \right) $$
(14.8)

Activation function of output layer selects Sigmoid function:

$$ g\left( x \right) = \left[ {1 + \exp \left( { - x} \right)} \right]^{ - 1} $$
(14.9)

Output of wavelet neural network is:

$$ \bar{y} = g\left[ {\sum\limits_{j}^{H} {w_{jk} \varphi_{a,b} } \left( {\sum\limits_{i}^{N} {{{\left( {w_{ij} x_{i} - b_{j} } \right)} \mathord{\left/ {\vphantom {{\left( {w_{ij} x_{i} - b_{j} } \right)} {a_{j} }}} \right. \kern-0pt} {a_{j} }}} } \right)} \right] $$
(14.10)

where \( a_{j} ,b_{j} \) are the scale factor and shift factor of the jth hidden layer neuron, \( w_{ij} \) is the connection weight between input layer and hidden layer, \( w_{jk} \) is the connection weight between hidden layer and output layer.

Define the network error E.

$$ E = \sum\limits_{k = 1}^{M} {\left( {\bar{y}_{k} - y_{k} } \right)^{2} } $$
(14.11)

where \( \bar{y}_{k} \) is the actual output, \( y_{k} \) is the desired output.

Wavelet neural network uses the gradient descent algorithm, and takes the sum of error square of actual output and desired output as the learning target to adjust the network connection weight, scale factor and shift scale. However, gradient descent algorithm has the shortcomings of slow convergence when learning samples are larger. The paper introduces the momentum factor to improve the network learning efficiency.

The network connection weight, scale and shift factor are corrected as follows.

$$ \begin{aligned} w_{ij}^{t + 1} = & \,w_{ij}^{t} - \eta \frac{\partial E}{{\partial w_{ij}^{t} }} + \alpha \Updelta w_{ij}^{t} ;\quad w_{jk}^{t + 1} = w_{jk}^{t} - \eta \frac{\partial E}{{\partial w_{jk}^{t} }} + \alpha \Updelta w_{jk}^{t} \\ a_{j}^{t + 1} = & \,a_{j}^{t} - \eta \frac{\partial E}{{\partial a_{j}^{t} }} + \alpha \Updelta a_{j}^{t} ;\quad b_{j}^{t + 1} = b_{j}^{t} - \eta \frac{\partial E}{{\partial b_{j}^{t} }} + \alpha \Updelta b_{j}^{t} \\ \end{aligned} $$
(14.12)

where t is number of iterations, \( \eta \) is learning rate, \( \alpha \) is momentum factor.

4 Simulation Experiments

Common types of faults in auxiliary inverter usually include voltage fluctuation, pulse transient and frequency variation. Collect 30 groups of data of each type with sample frequency 4,096 Hz, randomly select 18 groups of each fault with a total of 54 groups as training samples, the remaining 36 groups are taken as testing samples. The diagnostic labels of voltage fluctuation, pulse transient and frequency variation are (1 0 0), (0 1 0) and (0 0 1) respectively.

4.1 Fault Feature Extraction

Select wavelet basis ‘db4’ to decompose fault signals and get the eight-dimension feature vector, parts of initial feature samples are shown in Table 14.1.

Table 14.1 Parts of initial feature samples

Initial feature samples extracted by wavelet package transform are analyzed by PCA, as shown in Table 14.2.

Table 14.2 Analysis of initial feature samples by PCA

Concluded from Table 14.2, the cumulative contribution rate of former 6 components has reached 96.08 %, which exceeds the prescribed 95 % and nearly contains whole information. Therefore, the paper selects the former 6 components to compose the final fault feature vector to achieve dimension reduction.

4.2 Simulation Results

Neurons of input layer and output layer of PCA-WNN are determined 6 and 3 based on the fault feature samples processed by PCA. Based on amounts of tests, neurons of hidden layer are determined 13. Learning rate is 0.01, momentum factor is 0.923 and target error is 0.01. Initial feature samples are taken as the input of wavelet neural network (WP-WNN) with network structure 8-13-3, while other parameters are the same as PCA-WNN. Training error curves of PCA-WNN and WP-WNN are shown in Figs. 14.1 and 14.2. Table 14.3 shows parts of output PCA-WNN, Table 14.4 is the diagnostic comparison of PCA-WNN and WP-WNN.

Fig. 14.1
figure 1

Training error curve of PCA-WNN

Fig. 14.2
figure 2

Training error curve of WP-WNN

Table 14.3 Parts of network output of PCA-WNN
Table 14.4 Diagnostic comparison of PCA-WNN and WP-WNN

Concluded from Figs. 14.1 and 14.2, when the target error of wavelet neural network is 0.01, the network error of PCA-WNN converges to 0.0098 at 84 steps, while WP-WNN needs 218 steps of training to reach 0.0099. As is seen in Tables 14.3 and 14.4, actual outputs of PCA-WNN are closely approximating the desired outputs with the fault diagnostic accuracy 97.22 %, while the diagnostic accuracy of WP-WNN is 86.11 %.

5 Conclusion

The paper proposes the fault diagnosis method of auxiliary inverter based on the PCA and wavelet neural network. Initial feature vector extracted by wavelet package transform indicates the energy variation of fault signal, while PCA reduces the dimension of feature vector and simplifies the network structure. Simulation results show that, the proposed diagnostic method has fast convergence and classification precision, which is applicable to the diagnosis of auxiliary inverter.