A Robust M-Shaped Error Weighted Algorithms for Censored Regression

Zhao, Feng; Zhao, Haiquan; Wang, Wenyuan

doi:10.1007/s00034-019-01176-0

A Robust M-Shaped Error Weighted Algorithms for Censored Regression

Open access
Published: 25 June 2019

Volume 39, pages 324–343, (2020)
Cite this article

Download PDF

You have full access to this open access article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

A Robust M-Shaped Error Weighted Algorithms for Censored Regression

Download PDF

Feng Zhao^1,2,
Haiquan Zhao^1,2 &
Wenyuan Wang^1,2

1198 Accesses
9 Citations
Explore all metrics

Abstract

In reality, the range of sensor response is limited in many sensor systems due to the saturation characteristics of the sensor. That is, the value exceeding the sensor response range is not observed. Using traditional adaptive algorithms to identify the system of this type may lead to the performance degradation. To address this problem, the censored regression algorithms have been proposed. However, when the mixed sub-Gaussian and super-Gaussian/impulsive noises occur, these algorithms may fail to work. To overcome these drawbacks, a family of robust M-shaped (FRMS) functions for censored regression (CR-FRMS) is proposed in this paper. When the system to be identified exhibits a certain degree of sparsity, the CR-FRMS algorithm cannot fully utilize the characteristics of the sparse system. Therefore, in this paper, proportionate FRMS (PFRMS) algorithm based on $ l_{0} $-norm constraint for censored regression ($ l_{0} $-CRPFRMS) is also proposed accordingly. The simulations using Gaussian white noise as the input signal and the non-Gaussian mixed noise as the background noise show that the proposed algorithm performs better than other algorithms.

A Review on Kalman Filter Models

Article 01 October 2022

Masoud Khodarahmi & Vafa Maihami

Guided regularized random forest feature selection for smartphone based human activity recognition

Article 13 May 2022

Dipanwita Thakur & Suparna Biswas

Selecting critical features for data classification based on machine learning methods

Article Open access 23 July 2020

Rung-Ching Chen, Christine Dewi, … Rezzy Eko Caraka

1 Introduction

Linear regression models are widely used for signal processing, communication, and many other yields, assuming that the observed data is fully available. To solve this model problem, the researchers have proposed a large number of adaptive algorithms, such as least mean square (LMS) algorithm [13], normalized LMS (NLMS) algorithm, affine projection algorithm (APA) [14, 30, 33, 42] and so on.

Unfortunately, the requirement for linear regression may not be usually met in many practical applications. In general, output data whose value exceeds the limit of the recording device cannot be observed [3, 15, 29]. In other words, the output data whose values lie in a certain range are available. This situation occurs in economics [4], statistics [11, 27], engineering applications [28, 26], and medical research [7, 31]. In some systems, due to sensors’ saturation characteristics [1, 9, 43], it is not possible to collect complete data very efficiently. For example, in microphone array signal processing [2, 16], when the amplitude of speech signal exceeds a certain amplitude threshold of the microphones, it is cut off and the signal wave shape may be distorted to be flat because the signal’s positive and negative peaks exceeding the threshold are lost, or censored. Actually, the censored regression can be seen as a nonlinear regression model which includes a saturated nonlinearity model and linear system [23, 38]. Since the output data of the censored regression may lose significant information, using the traditional algorithms to identify this type of model may result in biased and wrong estimates [22]. Recently, in an attempt to deal with the censored regression problem, numerous algorithms have been proposed, such as maximum likelihood (ML) methods [8], two-step estimator [15], least absolute deviation [29] To solve online censored regression problems, Liu et al. [24] proposed the adaptive Heckman two-step algorithm (TSA) which significantly outperforms the conventional adaptive algorithms while the output data is censored.

As is well known, most of the adaptive algorithms are based on Gaussian noise environment. However, real-world signals often exhibit non-Gaussian properties. For example, in the beamforming, sub-Gaussian (light-tailed) signals are frequently encountered [17]. In some active noise control (ANC) applications, mechanical friction, vibration noise, and speech signal are super-Gaussian/impulsive (heavy-tailed) signals [39]. In the blind source separation (BSS), adaptive receivers with multiple antennas, and image denoising, the signal may be the mixed sub-Gaussian and super-Gaussian/impulsive signal [18,19,20,20, 32, 35, 40, 46].

In an attempt to improve the robustness of the algorithm in mixed non-Gaussian background noise environments, a family of robust M-shaped (FRMS) functions was applied to the adaptive algorithms [45].

Furthermore, when the system exhibits a certain degree of sparsity, the appeal algorithm cannot utilize the characteristics of the system. To deal with this issue, proportional algorithms [10] and multiple norm forms of zero-attraction algorithms, such as $ l_{0} $-norm, $ l_{1} $-norm, and $ l_{p} $-norm [5, 12, 25, 34, 37, 39, 46], are applied to this type of system. In fact, the ideal sparse measure is the $ l_{0} $-norm that counts the number of nonzero components. Therefore, the $ l_{0} $-norm constraint is adopted in this paper.

In this paper, contributions are made as follows:

(i)
For the first time, a family of robust M-shaped algorithm for mixed non-Gaussian background noises under the censored regression model is proposed. The algorithm not only can effectively compensate the output signal, but also can deal with the effects of sub-Gaussian and super-Gaussian/impulsive noises.
(ii)
For the characteristics of sparse system, the $ l_{0} $-norm proportional FRMS algorithm under the censored regression model ($ l_{0} $-CRPFRMS) is also proposed for the first time.
(iii)
Simulation examples to demonstrate the performance of the proposed algorithm in non-Gaussian background noise environments.

2 Description and Preliminaries

2.1 Problem Formulation

Consider the following linear regression model

$$ \hat{d}_{n} = {\mathbf{u}}_{n}^{\text{T}} {\mathbf{w}}_{o} + \eta_{n} + \xi_{n} $$

(1)

where $ {\mathbf{w}}_{o} $ is the unknown column vector of size, $ \eta_{n} $ denotes the background noise with zero mean and variance $ \sigma_{i}^{2} $. In this paper, sub-Gaussian and super-Gaussian noises are involved, and $ \xi_{n} $ represents the impulsive noise with zero mean. When the data $ \hat{d}_{n} $ and $ {\mathbf{u}} $ are completely observed, using some traditional methods, the unknown vector $ {\mathbf{w}}_{o} $ can be identified easily. However, the data $ \hat{d}_{n} $ may be not completely observed in some practical application. The censored output $ d_{n} $ can be formulated by

$$ d_{n} = ({\mathbf{u}}_{n}^{\text{T}} {\mathbf{w}}_{o} + \eta_{n} )_{ + } = (\hat{d}_{n} )_{ + } $$

(2)

where $ (\hat{d}_{n} )_{ + } = \hbox{max} \{ 0,\hat{d}_{n} \} $.

Remark 1

In this paper, although the left-censored to 0 is only applied to the algorithm, the right-censored and both-sides-censored cases can also be applied in the algorithm, and the processing method is similar. In particular, if the output $ \hat{d}_{n} $ is left-censored or right-censored to a constant $ c $, the censored output $ d_{n} $ can be expressed as

$$ d_{n} = \hbox{max} \{ c,\hat{d}_{n} \} = (\hat{d}_{n} - c)_{ + } + c $$

(3)

and

$$ d_{n} = \hbox{min} \{ c,\hat{d}_{n} \} = c - (c - \hat{d}_{n} )_{ + } $$

(4)

respectively.

2.2 Review of FRMS Algorithm

In [23], the authors divide the existing error nonlinear adaptive algorithms based on LMS into three categories, V-shaped, $ \varLambda $-shaped, and M-shaped algorithms. After comparing the advantages and disadvantages of each algorithm, the FRMS function was proposed, where its weighted function is

$$ f(e_{n} ) = \frac{{|e_{n} |^{p} }}{{\varsigma + |e_{n} |^{p + 1} }},p > 0 $$

(5)

where $ \varsigma > 0 $ is the parameter. For instance, when $ \varsigma \to 0 $, it will be a $ \varLambda $-shaped algorithm. ($ \varLambda $-shaped algorithm is applicable to the super-Gaussian noise than that for sub-Gaussian noise environment.) When $ \varsigma \to \infty $, it is a V-shaped algorithm. (V-shaped algorithm is more suitable for sub-Gaussian noise.) In the proposed robust M-shaped function, its denominator is with an order higher than the numerator. Using the gradient descent, this algorithm essentially solves the stochastic cost function

$$ J(e_{n} ) = \int_{0}^{{e_{n} }} {\frac{{x|x|^{p} }}{{\varsigma + |x|^{p + 1} }}} {\text{d}}x $$

(6)

which is positive semi-definite for $ \varsigma > 0 $; p > 0. Thus, the proposed robust M-shaped algorithm will not suffer from the local minimum problem.

3 Proposed New Algorithm

3.1 Proposed CR-FRMS Algorithm

Since the observations in this paper are assumed to be censored, the FRMS algorithm would obtain a biased estimate which is called sample selection bias [34, 44]. In other words, when $ \hat{d}_{n} < 0 $, the data $ \hat{d}_{n} $ is missing, which leads to the bias and the inequality $ E[d_{n} |{\mathbf{u}}_{n}^{\text{T}} {\mathbf{w}}_{o} ] \ne {\mathbf{u}}_{n}^{\text{T}} {\mathbf{w}}_{o} $. To compensate the bias, firstly, it is necessary to correct the sample selection bias using FRMS in censored observations, which is inspired by the Heckman two-stage approach [15]. Due to the left-censored property of $ d_{n} $, only positive values of $ d_{n} $ can be correctly obtained. Recalling (1) and noting that the background noise $ \eta_{n} $ and impulsive noise $ \xi_{n} $ are both zero-mean signals, the expectation of $ d_{n} $ under the condition $ d_{n} > 0 $ can be expressed by

$$ \begin{aligned} & E[d_{n} |{\mathbf{u}}_{n}^{\text{T}} {\mathbf{w}}_{o} ,d_{n} > 0] \\ & \quad = E[\hat{d}_{n} |{\mathbf{u}}_{n}^{\text{T}} {\mathbf{w}}_{o} ,\hat{d}_{n} > 0] \\ & \quad = {\mathbf{u}}_{n}^{T} {\mathbf{w}}_{o} + E[\eta_{n} + \xi_{n} |\eta_{n} + \xi_{n} > - {\mathbf{u}}_{n}^{\text{T}} {\mathbf{w}}_{o} ]. \\ \end{aligned} $$

(7)

Since the impulsive noise occurs with a very low probability, the following approximation is reasonable,

$$ E[\eta_{n} + \xi_{n} |\eta_{n} + \xi_{n} > - {\mathbf{u}}_{n}^{\text{T}} {\mathbf{w}}_{o} ] \approx E[\eta_{n} |\eta_{n} > - {\mathbf{u}}_{n}^{\text{T}} {\mathbf{w}}_{o} ] $$

(8)

Before calculating the last term of (5), the following lemma is introduced.

Lemma

The condition expectation $ E[x|x > - c] $ satisfies

$$ E[x|x > - c] = \frac{\varphi (c)}{{\Phi (c)}} = \varOmega (c). $$

(9)

Proof

See “Appendix” for detail.

Using the lemma, we have

$$ E[\eta_{n} |\eta_{n} > - {\mathbf{u}}_{n}^{\text{T}} {\mathbf{w}}_{o} ] = \sigma_{n} \varOmega ({\mathbf{u}}_{n}^{\text{T}} {\varvec{\upalpha}}) $$

(10)

with $ \varphi ( \cdot ) $ and $ \varPhi ( \cdot ) $ given in Table 1. In addition, the vector $ {\varvec{\upalpha}} $ is given by

$$ {\varvec{\upalpha}} = \frac{{{\mathbf{w}}_{o} }}{{\sigma_{i} }} $$

(11)

Table 1 Probability density function and distribution function of three different noises

Full size table

Then, using (9) and the probability theory yields

$$ \begin{aligned} E[d_{n} |{\mathbf{u}}_{n}^{\text{T}} {\mathbf{w}}_{o} ] & = Pr(d_{n} > 0)E[d_{n} |{\mathbf{u}}_{n}^{\text{T}} {\mathbf{w}}_{o} ,d_{n} > 0] \\ & = \varPhi ({\mathbf{u}}_{n}^{\text{T}} {\varvec{\upalpha}}){\mathbf{u}}_{n}^{\text{T}} {\mathbf{w}}_{o} + \sigma_{i} \varphi ({\mathbf{u}}_{n}^{\text{T}} {\varvec{\upalpha}}) \\ \end{aligned} $$

(12)

where the second equation comes from the fact that the probability of $ d_{n} > 0 $ is equal to $ \varPhi ({\mathbf{u}}_{n}^{\text{T}} {\varvec{\upalpha}}) $, i.e., $ Pr(d_{n} > 0) = {\varPhi (}{\mathbf{u}}_{n}^{\text{T}} {\varvec{\upalpha}}) $. According to (12), the censored regression model (2) can be expressed as:

$$ d_{n} = {\varPhi} ({\mathbf{u}}_{n}^{\text{T}} {\varvec{\upalpha}}){\mathbf{u}}_{n}^{\text{T}} {\mathbf{w}}_{o} + \sigma_{i} \varphi ({\mathbf{u}}_{n}^{\text{T}} {\varvec{\upalpha}}) + v_{n} $$

(13)

where $ v_{n} $ is the random variable with zero mean, i.e.,

$$ E[v_{n} |{\mathbf{u}}_{n}^{\text{T}} {\mathbf{w}}_{o} ] = 0. $$

(14)

Since the algorithm based on MSE criterion cannot estimate $ {\mathbf{w}}_{o} $ correctly, the cost function based on FRMS is adopted

$$ \zeta_{\text{FRMS}} = E\left( {\frac{{|\varepsilon_{n} ({\mathbf{w}},\sigma_{n} ,{\varvec{\upalpha}})|^{p} }}{{\varsigma + |\varepsilon_{n} ({\mathbf{w}},\sigma_{n} ,{\varvec{\upalpha}})|^{p + 1} }}} \right) $$

(15)

where

$$ \varepsilon_{n} ({\mathbf{w}},\sigma_{n,} {\varvec{\upalpha}}) = d_{n} { - \varPhi (}{\mathbf{u}}_{n}^{\text{T}} {\varvec{\upalpha}}){\mathbf{u}}_{n}^{\text{T}} {\mathbf{w}} + \sigma_{i} \varphi ({\mathbf{u}}_{n}^{\text{T}} {\varvec{\upalpha}}). $$

(16)

Obviously, to estimate $ {\mathbf{w}}_{o} $, the estimation for $ {\varvec{\upalpha}} $ and $ \sigma_{n} , $ should also be available. In the sequel, an indicator variable $ a_{n} $ is introduce to estimate $ {\varvec{\upalpha}} $,

$$ a_{n} = \left\{ {\begin{array}{*{20}l} {1,} \hfill & {{\text{if}}\;d_{n} > 0} \hfill \\ {0,} \hfill & {\text{otherwise}} \hfill \\ \end{array} } \right. $$

(17)

The probabilities of $ a_{n} $ is expressed as

$$ Pr(a_{n} = 1) = \varPhi ({\mathbf{u}}_{n}^{\text{T}} {\varvec{\upalpha}}) $$

(18)

$$ Pr(a_{n} = 0) = \varPhi ( -{\mathbf{u}}_{n}^{\text{T}} {\varvec{\upalpha}}). $$

(19)

Then, the following optimization problem is considered to estimate $ {\varvec{\upalpha}} $ [22],

$$ {\boldsymbol{\hat{\alpha}}} = \arg \mathop {\hbox{max} }\limits_{{{\boldsymbol{\bar{\alpha }}}}} = \varGamma ({\boldsymbol{\bar{\alpha }}}) = \arg \mathop {\hbox{max} }\limits_{{{\boldsymbol{\bar{\alpha }}}}} E[\varGamma_{n} ({\boldsymbol{\bar{\alpha }}})] $$

(20)

where

$$ \varGamma ({\boldsymbol{\bar{\alpha }}}) = E[\varGamma_{n} ({\boldsymbol{\bar{\alpha }}})] $$

(21)

and

$$ \varGamma_{n} ({\boldsymbol{\bar{\alpha }}}) = \log (Pr(d_{n} |{\mathbf{u}}_{n} ,{\varvec{\upalpha}})) $$

(22)

with

$$ Pr(d_{n} |{\mathbf{u}}_{n} ,{\varvec{\upalpha}}) = \varPhi ({\mathbf{u}}_{n}^{\text{T}} {\varvec{\upalpha}})^{{a_{n} }} \varPhi ( -{\mathbf{u}}_{n}^{\text{T}} {\varvec{\upalpha}})^{{1 - a_{n} }} . $$

(23)

In the sequel, using the steepest ascent principle yields [22]

$$ \begin{aligned} {\boldsymbol{\hat{\alpha }}}_{n} & = {\boldsymbol{\hat{\alpha }}}_{n - 1} + \mu \frac{{\partial \varGamma_{n} ({\varvec{\upalpha}})}}{{\partial {\varvec{\upalpha}}}}|_{{{\boldsymbol{\hat{\alpha }}}_{n - 1} }} \\ & = {\boldsymbol{\hat{\alpha }}}_{n - 1} + \mu [a_{n} \varOmega (u_{n}^{\text{T}} {\boldsymbol{\hat{\alpha }}}_{n - 1} ){\mathbf{u}}_{n} - (1 - a_{n} )\varOmega ( - {\mathbf{u}}_{n}^{\text{T}} {\boldsymbol{\hat{\alpha }}}_{n - 1} ){\mathbf{u}}_{n} ]. \\ \end{aligned} $$

(24)

Then, the estimation for $ \sigma_{n} $ is considered. Using the decent method and the cost function $ f(\varepsilon_{n} ({\mathbf{w}},\sigma_{n,} {\varvec{\upalpha}})) $ with respected to $ \sigma_{n,i - 1} $, we have

$$ \begin{aligned} \hat{\sigma }_{i,n} & = \hat{\sigma }_{i,n - 1} - \frac{\mu }{2}\frac{{\partial f(\varepsilon_{n} ({\mathbf{w}},\sigma_{n,} {\varvec{\upalpha}}))}}{{\partial \sigma_{i,} }}|_{{\hat{\sigma }_{i,n - 1} }} \\ & = \hat{\sigma }_{i,n - 1} + \mu \varphi ({\mathbf{u}}_{n}^{\text{T}} {\boldsymbol{\hat{\alpha }}}_{n} )f(\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n - 1} ,{\boldsymbol{\hat{\alpha }}}_{n} ))\varepsilon_{n} ({\mathbf{w}},\sigma_{n,i - 1} ,{\boldsymbol{\hat{\alpha }}}_{n} ) \\ \end{aligned} $$

(25)

where

$$ f(\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n - 1} ,{\boldsymbol{\hat{\alpha }}}_{n - 1} )) = \frac{{|d_{n} - \varPhi ({\mathbf{u}}_{n}^{\text{T}} {\varvec{\upalpha}}){\mathbf{u}}_{n}^{\text{T}} {\mathbf{w}} + \sigma_{i} \varphi ({\mathbf{u}}_{n}^{\text{T}} {\varvec{\upalpha}})|^{p} }}{{\varsigma + |d_{n} - \varPhi ({\mathbf{u}}_{n}^{\text{T}} {\varvec{\upalpha}}){\mathbf{u}}_{n}^{\text{T}} {\mathbf{w}} + \sigma_{i} \varphi ({\mathbf{u}}_{n}^{\text{T}} {\varvec{\upalpha}})|^{p + 1} }}. $$

(26)

Similarly, the weight update formula can be obtained by the gradient descent method

$$ \begin{aligned} {\mathbf{w}}_{n} & = {\mathbf{w}}_{n - 1} - \mu \frac{{\partial \zeta_{\text{FRMS}} ({\mathbf{w}})}}{{\partial {\mathbf{w}}}}|_{{{\mathbf{w}}_{n - 1} }} \\ & = {\mathbf{w}}_{n - 1} + \mu\Phi _{n} ({\mathbf{u}}_{n}^{\text{T}} {\boldsymbol{\hat{\alpha }}}_{n} ){\mathbf{u}}_{n}^{\text{T}} f(\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n} ,{\boldsymbol{\hat{\alpha }}}_{n} ))\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n} ,{\boldsymbol{\hat{\alpha }}}_{n} ) \\ \end{aligned} $$

(27)

where

$$ \zeta_{NFRMS} ({\mathbf{w}}) = f(\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n - 1} ,{\boldsymbol{\hat{\alpha }}}_{n - 1} )). $$

(28)

3.2 Proposed $ l_{0} $-CRPFRMS algorithm

In many practical applications, such as digital TV transmission channels and echo paths, the systems that need to be identified are sparse. However, the algorithm in Sect. 3.1 does not make full use of the characteristics of the sparse system. Therefore, this section proposes a $ l_{0} $-norm proportional FRMS algorithm based on the censored regression model ($ l_{0} $-CRPFRMS).

Sparse system is defined whose impulse response contains many near-zero coefficients and few large ones. In [36], the author first proposed a proportional idea, the (PNLMS) algorithm. Its iteration formula is as follows:

$$ \hat{y}(k) = \sum\limits_{n = 0}^{N - 1} {\hat{h}_{n} (k)x(k - n)} $$

(29)

$$ e(k) = y(k) - \hat{y}(k) $$

(30)

$$ l_{\infty } (k) = \hbox{max} \{ |\hat{h}_{0} (k)|, \ldots ,|\hat{h}_{N - 1} (k)|\} $$

(31)

$$ l^{\prime}_{\infty } (k) = \hbox{max} \{ \delta ,l_{\infty } (k)\} $$

(32)

$$ g_{n} (k) = \hbox{max} \{ \rho l^{\prime}_{\infty } (k),|\hat{h}_{n} (k)|\} $$

(33)

$$ \bar{g}(k) = \frac{1}{N}\sum\limits_{n = 0}^{N - 1} {g_{n} (k)} $$

(34)

$$ \hat{\sigma }_{x}^{2} (k) = \frac{1}{N}\sum\limits_{n = 0}^{N - 1} {x^{2} (k - n)} $$

(35)

$$ \hat{h}_{n} (k + 1) = \hat{h}_{n} (k) + \frac{\mu }{N}\frac{{g_{n} (k)}}{{\bar{g}(k)}}\frac{e(k)x(k - n)}{{\hat{\sigma }_{x}^{2} (k)}} . $$

(36)

where $ x( \cdot ) $ is input signal, $ \hat{y}( \cdot ) $ is output signal and $ y( \cdot ) $ is expected signal, and $ \hat{h}_{n} ( \cdot ) $ is tap weight. The parameters $ \rho $ and $ \delta $ affect small-signal regularization. In (32), $ \rho $ prevents $ \hat{h}_{n} (k + 1) $ from stalling when it is much smaller than the largest coefficient and $ \delta $ regularizes the updating when all coefficients are zero at initialization. It can be seen from (31–34, 36) that when the tap weight is closer to zero, the $ g_{n} (k)/\bar{g}(k) $ item will become smaller and smaller. On the contrary, when the tap weight is far from zero, more energy will be gained in iteration. In this way, the aim of proportional algorithm is achieved, that is, the convergence speed of the algorithm is accelerated without changing the steady-state mean square deviation (MSD).

In order to further accelerate the convergence speed of the algorithm better, this section adds another method for sparse systems, i.e.,$ l_{0} $-norm constraint [12, 34], on the premise of adding proportional algorithm. As the name implies, it is to add a $ \gamma ||{\mathbf{w}}||_{0} $ to the original cost function. In this section, the cost function is changed into

$$ \vartheta (n) = \int_{0}^{{e_{n} }} {\frac{{x|x|^{p} }}{{\varsigma + |x|^{p + 1} }}} {\text{d}}x + \gamma ||{\mathbf{w}}_{n} ||_{0} . $$

(37)

where $ \gamma > 0 $ is a factor to balance the new penalty and the estimation error. In [12], in order to reduce the computational complexity, the author firstly approximates $ ||{\mathbf{w}}_{n} ||_{0} $ to $ \sum\nolimits_{i}^{L - 1} {(1 - {\text{e}}^{{ - \beta |w_{n} (i)|}} )} $, then uses gradient descent method to derive the cost function, and then uses the Taylor formula to perform a first-order expansion on the zero attracting term in the iterative formula, and finally obtains the weight update formula with lower computational complexity,

$$ {\mathbf{w}}_{n} = {\mathbf{w}}_{n - 1} + \;{\text{gradient}}\;{\text{correction}} + {\text{zero}}\;{\text{attraction}} $$

(38)

where zero attraction means $ \kappa f_{\beta } ({\mathbf{w}}_{n} ) \cdot \kappa = \mu \gamma $ is a positive constant and $ f_{\beta } ( \cdot ) $ is defined as

$$ f_{\beta } (x) = \left\{ {\begin{array}{*{20}l} {\beta^{2} x + \beta ,} \hfill & { - \tfrac{1}{\beta } \le x \le 0} \hfill \\ {\beta^{2} x - \beta ,} \hfill & {0 \le x \le \tfrac{1}{\beta }} \hfill \\ {0,} \hfill & {\text{elsewhere}} \hfill \\ \end{array} } \right.. $$

(39)

The parameter $ \beta $ is set to 5 in this paper. From (39), it can be seen that when the coefficients are in the range of $ ( - 1/\beta ,1/\beta ) $, they will be constantly attracted to zero, and when the coefficients are not in the range, there will be no additional attraction, which will improve the convergence speed of those coefficients close to zero, and the overall convergence speed will be accelerated.

As in Sect. 3.1, primarily, it is necessary to process the output signal to get Eq. (13). Then, the error can be expressed as Eq. (16). Next, it is also needed to process $ {\varvec{\upalpha}} $ and then use the steepest descent method to get Eq. (24). Again, the next step is to estimate the parameter $ \sigma_{n} $. That is, the cost function based on $ l_{0} $-CRPFRMS is derived and obtained.

$$ \begin{aligned} \hat{\sigma }_{i,n} & = \hat{\sigma }_{i,n - 1} - \frac{\mu }{2}\frac{{\partial \vartheta_{{l_{0} {\text{-CRPFRMS}}}} ({\mathbf{w}},\sigma_{i,n - 1} ,{\varvec{\upalpha}})}}{{\partial \sigma_{i,} }}|_{{\hat{\sigma }_{i,n - 1} }} \\ & = \hat{\sigma }_{i,n - 1} + \mu \varphi ({\mathbf{u}}_{n}^{\text{T}} {\boldsymbol{\hat{\alpha }}}_{n} )f(\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n - 1} ,{\boldsymbol{\hat{\alpha }}}_{n} ))\varepsilon_{n} ({\mathbf{w}},\sigma_{n,i - 1} ,{\boldsymbol{\hat{\alpha }}}_{n} ). \\ \end{aligned} $$

(40)

Similarly, the weight update formula can be obtained as follows

$$ \begin{aligned} {\mathbf{w}}_{n} & = {\mathbf{w}}_{n - 1} + \mu G_{n - 1}\Phi _{n} ({\mathbf{u}}_{n}^{\text{T}} {\boldsymbol{\hat{\alpha }}}_{n} )u_{n}^{\text{T}} f(\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n} ,{\boldsymbol{\hat{\alpha }}}_{n} ))\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n} ,{\boldsymbol{\hat{\alpha }}}_{n} ) \\ & \quad - \kappa f_{\beta } ({\mathbf{w}}_{n - 1} ) \\ \end{aligned} $$

(41)

where

$$ {\mathbf{w}}_{n} = (w_{n} (0), \ldots ,w_{n} (L - 1)) $$

(42)

$$ {\mathbf{G}}_{n - 1} = {\text{diag}}\{ g_{n - 1} (0), \ldots ,g_{n - 1} (L - 1)\} . $$

(43)

The diagonal elements of $ {\mathbf{G}}_{n - 1} $ are calculated as follows:

$$ g_{n - 1} (l) = \frac{{\theta_{n - 1} (l)}}{{\sum\nolimits_{l = 0}^{L - 1} {\theta_{n - 1} (i)} }},\quad 0 \le l \le L - 1 $$

(44)

$$ \theta_{n - 1} (l) = \hbox{max} \{ \rho \hbox{max} [|w_{n - 1} (0)|, \ldots ,|w_{n - 1} (L - 1)|],|w_{n - 1} (l)|\} $$

(45)

where $ \rho = 5/L $.

3.3 The Convergence Analysis of $ l_{0} $-CRPFRMS Algorithm

The steady-state performance of $ l_{0} $-CRPFRMS algorithm is analyzed in the following part. To make the analysis tractable, the following assumptions are given, which are commonly used in the analysis of adaptive filtering algorithm [3, 13].

Assumption 1

The noise $ \eta_{n} $ is independent of the input signal. The impulsive noise $ \xi_{n} $ doesn’t occur.

Assumption 2

The weight error vector $ {\boldsymbol{\hat{w}}}_{n} = {\mathbf{w}}_{o} - {\mathbf{w}}_{n} $ is independent of the input signal.

Obviously, (24) can be rewritten

$$ {\boldsymbol{\hat{\alpha }}}_{n} = {\boldsymbol{\hat{\alpha }}}_{n - 1} + \mu \varGamma_{n}^{\prime } ({\boldsymbol{\hat{\alpha }}}_{n - 1} ). $$

(46)

Then, the Taylor expansion of $ \varGamma_{n}^{\prime } ({\boldsymbol{\hat{\alpha }}}_{n - 1} ) $ at the point $ {\boldsymbol{\hat{\alpha }}}_{n - 1} = {\mathbf{ \alpha }} $ is given by,

$$ \begin{aligned} \varGamma_{n}^{\prime } ({\boldsymbol{\hat{\alpha }}}_{n - 1} ) & = \varGamma_{n}^{\prime } ({\boldsymbol{\hat{\alpha }}}) + \varGamma_{n}^{\prime \prime } ({\boldsymbol{\hat{\alpha }}}_{n - 1} )({\boldsymbol{\hat{\alpha }}}_{n - 1} - {\boldsymbol{\hat{\alpha }}}) \\ & = \varGamma_{n}^{\prime } ({\boldsymbol{\hat{\alpha }}}) - {\boldsymbol{\rm Z}}_{n} ({\boldsymbol{\hat{\alpha }}}_{n - 1} - {\boldsymbol{\hat{\alpha }}}) \\ \end{aligned} $$

(47)

where $ {\boldsymbol{\rm Z}}_{n} = \varGamma_{n}^{\prime \prime } ({\boldsymbol{\hat{\alpha }}}_{n - 1} ) $. Letting $ {\boldsymbol{\tilde{\alpha }}}_{n - 1} = {\varvec{\upalpha}} - {\boldsymbol{\hat{\alpha }}}_{n - 1} $, subtracting both sides of (46) by $ {\varvec{\upalpha}} $ and using (47) yields

$$ {\boldsymbol{\tilde{\alpha }}}_{n} = {\boldsymbol{\tilde{\alpha }}}_{n - 1} - \mu \varGamma_{n}^{\prime } ({\boldsymbol{\hat{\alpha }}}_{n - 1} ) = ({\mathbf{I}}_{M} + \mu {\boldsymbol{\rm Z}}_{n} ){\boldsymbol{\tilde{\alpha }}}_{n - 1} - \mu \varGamma_{n}^{\prime } ({\varvec{\upalpha}}). $$

(48)

Using the Assumptions A1–A2 and taking expectation both sides of (4) leads to

$$ E[{\boldsymbol{\tilde{\alpha }}}_{n} ] = ({\mathbf{I}}_{M} + \mu E[{\boldsymbol{\rm Z}}_{n} ])E[{\boldsymbol{\tilde{\alpha }}}_{n - 1} ] - \mu E[\varGamma_{n}^{\prime } ({\varvec{\upalpha}})]. $$

(49)

According to [24], $ E[{\boldsymbol{\rm Z}}_{n} ] $ is the hessian matrix of $ \varGamma_{n} ({\boldsymbol{\hat{\alpha }}}_{n - 1} ) $ and is negative definite. In the sequel, taking the expectation $ \varGamma_{n} ({\varvec{\upalpha}}) $ with only $ v_{n} $ results in

$$ E_{v} [a_{n} ] = Pr(a_{n} = 1) = \varPhi (u_{n}^{\text{T}} {\boldsymbol{\bar{\alpha }}}). $$

(50)

Using (50) and Assumptions A1–A2 yields (see [24] for detail)

$$ E[\varGamma_{n}^{\prime } ({\varvec{\upalpha}})] = 0. $$

(51)

Substituting (52) into (50) yields

$$ E[{\boldsymbol{\tilde{\alpha }}}_{n} ] = ({\mathbf{I}}_{M} + \mu E[{\boldsymbol{\rm Z}}_{n} ])E[{\boldsymbol{\tilde{\alpha }}}_{n - 1} ]. $$

(52)

To guarantee the stability in the mean sense, the matrix should be stable. Note that the negative-definite has negative eigenvalues, and hence, the step size should be selected according to

$$ 0 < \mu < - \frac{2}{{\lambda_{\hbox{min} } (E[{\boldsymbol{\rm Z}}_{n} ])}}. $$

(53)

Under this condition, we have

$$ E[{\boldsymbol{\tilde{\alpha }}}_{\infty } ] = {\varvec{\upalpha}}. $$

(54)

In the other words, Eq. (24) is an unbiased estimate of $ {\varvec{\upalpha}} $ if the proposed algorithm is stable. Then, (40) and (41) can be rewritten, respectively,

$$ \begin{aligned} \hat{\sigma }_{i,n} & = \hat{\sigma }_{i,n - 1} - \frac{\mu }{2}\frac{{\partial \vartheta_{{l_{0} {\text{-CRPFRMS}}}} ({\mathbf{w}},\sigma_{i,n - 1} ,{\boldsymbol{\hat{\alpha }}}_{n} )}}{{\partial \sigma_{i,} }}|_{{\hat{\sigma }_{i,n - 1} }} \\ & = \hat{\sigma }_{i,n - 1} + \mu \varphi ({\mathbf{u}}_{n}^{\text{T}} {\boldsymbol{\hat{\alpha }}}_{n} )f(\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n - 1} ,{\boldsymbol{\hat{\alpha }}}_{n} ))\varepsilon_{n} ({\mathbf{w}},\sigma_{n,i - 1} ,{\boldsymbol{\hat{\alpha }}}_{n} ) \\ \end{aligned} $$

(55)

$$ \begin{aligned} {\mathbf{w}}_{n} & = {\mathbf{w}}_{n - 1} + \mu G_{n - 1} \varPhi_{n} ({\mathbf{u}}_{n}^{\text{T}} {\boldsymbol{\hat{\alpha }}}_{n} )u_{n}^{\text{T}} f(\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n} ,{\boldsymbol{\hat{\alpha }}}_{n} ))\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n} ,{\boldsymbol{\hat{\alpha }}}_{n} ) \\ & \quad - \kappa f_{\beta } ({\mathbf{w}}_{n - 1} ) \\ & \approx {\mathbf{w}}_{n - 1} + \mu G_{n - 1} \varPhi_{n} ({\mathbf{u}}_{n}^{\text{T}} {\boldsymbol{\hat{\alpha }}}_{n} )u_{n} f(\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n - 1} ,{\boldsymbol{\hat{\alpha }}}_{n} ))\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n - 1} ,{\boldsymbol{\hat{\alpha }}}_{n} ) \\ & \quad - \kappa f_{\beta } ({\mathbf{w}}_{n - 1} ). \\ \end{aligned} $$

(56)

Combining (55) and (56) gives

$$ \begin{aligned} {\boldsymbol{\hat{\theta }}}_{n} & = \left[ {\begin{array}{*{20}c} {{\mathbf{w}}_{n} } \\ {\hat{\sigma }_{i,n} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {{\mathbf{w}}_{n - 1} } \\ {\hat{\sigma }_{i,n - 1} } \\ \end{array} } \right] + \mu f(\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n - 1} ,{\boldsymbol{\hat{\alpha }}}_{n} ))\varepsilon_{n} ({\mathbf{w}},\sigma_{n,i - 1} ,{\boldsymbol{\hat{\alpha }}}_{n} ){\boldsymbol{\bar{G}}}_{n - 1} {\boldsymbol{\hat{h}}}_{n} \\ & \quad - \left[ {\begin{array}{*{20}c} 0 \\ {\kappa f_{\beta } ({\mathbf{w}}_{n - 1} )} \\ \end{array} } \right] \\ \end{aligned} $$

(57)

where

$$ {\boldsymbol{\bar{G}}}_{n - 1} = \left[ {\begin{array}{*{20}c} {{\mathbf{G}}_{n - 1} } & 0 \\ 0 & 1 \\ \end{array} } \right] $$

(58)

$$ {\boldsymbol{\hat{h}}}_{n} = \left[ {\begin{array}{*{20}c} {\varPhi_{n} ({\mathbf{u}}_{n}^{\text{T}} {\boldsymbol{\hat{\alpha }}}_{n} )u_{n} } \\ {\varphi ({\mathbf{u}}_{n}^{\text{T}} {\boldsymbol{\hat{\alpha }}}_{n} )} \\ \end{array} } \right]. $$

(59)

Then, the desired signal $ d_{n} $ can be written as

$$ d_{n} = {\mathbf{h}}_{n}^{\text{T}} {\varvec{\uptheta}}_{\text{opt}} + v_{n} $$

(60)

where

$$ {\varvec{\uptheta}}_{\text{opt}} = \left[ {\begin{array}{*{20}c} {{\mathbf{w}}_{o} } \\ {\sigma_{i} } \\ \end{array} } \right] $$

(61)

$$ {\mathbf{h}}_{n} = \left[ {\begin{array}{*{20}c} {\varPhi_{n} ({\mathbf{u}}_{n}^{\text{T}} {\boldsymbol{\hat{\alpha }}})u_{n} } \\ {\varphi ({\mathbf{u}}_{n}^{\text{T}} {\boldsymbol{\hat{\alpha }}})} \\ \end{array} } \right]. $$

(62)

Using (60), (16) implies

$$ \varepsilon_{n} ({\mathbf{w}},\sigma_{i,n - 1} ,{\boldsymbol{\hat{\alpha }}}_{n - 1} ) \approx \varepsilon_{n} ({\mathbf{w}},\sigma_{i,n - 1} ,{\varvec{\upalpha}}) = d_{n} - {\mathbf{h}}_{n}^{\text{T}} {\boldsymbol{\hat{\theta }}}_{n - 1} = {\mathbf{h}}_{n}^{\text{T}} {\boldsymbol{\tilde{\theta }}}_{n - 1} + v_{n} $$

(63)

where

$$ {\boldsymbol{\tilde{\theta }}}_{n} = {\varvec{\uptheta}}_{\text{opt}} - {\boldsymbol{\hat{\theta }}}_{n} . $$

(64)

Inserting (64) into (57) yields

$$ \begin{aligned} {\boldsymbol{\tilde{\theta }}}_{n} & = ({\mathbf{I}} - \mu f(\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n - 1} ,{\boldsymbol{\hat{\alpha }}})){\boldsymbol{\rm H}}_{n} ){\boldsymbol{\tilde{\theta }}}_{n - 1} \\ & \quad - \mu f(\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n - 1} ,{\boldsymbol{\hat{\alpha }}})v_{n} {\boldsymbol{\bar{G}}}_{n - 1} {\mathbf{h}}_{n} + \left[ {\begin{array}{*{20}c} 0 \\ {\kappa f_{\beta } ({\mathbf{w}}_{n - 1} )} \\ \end{array} } \right] \\ \end{aligned} $$

(65)

where

$$ {\boldsymbol{\rm H}}_{n} = {\boldsymbol{\bar{G}}}_{n - 1} {\mathbf{h}}_{n} {\mathbf{h}}_{n}^{\text{T}} . $$

(66)

Then, taking the expectation of both sides of (65)

$$ \begin{aligned} E[{\boldsymbol{\tilde{\theta }}}_{n} ] & = ({\mathbf{I}} - \mu E[f(\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n - 1} ,{\boldsymbol{\hat{\alpha }}})){\boldsymbol{\rm H}}_{n} ])E[{\boldsymbol{\tilde{\theta }}}_{n - 1} ] \\ & \quad - \mu f(\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n - 1} ,{\boldsymbol{\hat{\alpha }}})E[v_{n} {\mathbf{]}}E{\boldsymbol{[\bar{G}}}_{n - 1} {\mathbf{]}}E{\mathbf{[h}}_{n} ] + \left[ {\begin{array}{*{20}c} 0 \\ {E[\kappa f_{\beta } ({\mathbf{w}}_{n - 1} )]} \\ \end{array} } \right] \\ & = ({\mathbf{I}} - \mu {\boldsymbol{\Re }}_{n} )E[{\boldsymbol{\tilde{\theta }}}_{n - 1} ] + \left[ {\begin{array}{*{20}c} 0 \\ {E[\kappa f_{\beta } ({\mathbf{w}}_{n - 1} )]} \\ \end{array} } \right] \\ & \approx ({\mathbf{I}} - \mu {\boldsymbol{\Re }}_{n} )E[{\boldsymbol{\tilde{\theta }}}_{n - 1} ] \\ \end{aligned} $$

(67)

where

$$ {\boldsymbol{\Re }}_{n} = E[f(\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n - 1} ,{\boldsymbol{\hat{\alpha }}})){\boldsymbol{\rm H}}_{n} ]. $$

(68)

In order to ensure the convergence of the algorithm, this should make $ - 1 < ({\mathbf{I}} - \mu {\boldsymbol{\Re }}_{n} ) < 1 $. Therefore, the range of $ \mu $ is

$$ 0 < \mu < \frac{2}{{\lambda_{\hbox{max} } ({\boldsymbol{\Re }}_{n} )}}. $$

(69)

Combining (53) and (69), we can obtain

$$ 0 < \mu < \hbox{min} \left( { - \frac{2}{{\lambda_{\hbox{min} } (E[{\boldsymbol{\rm Z}}_{n} ])}},\frac{2}{{\lambda_{\hbox{max} } ({\boldsymbol{\Re }}_{n} )}}} \right). $$

(70)

4 Simulation

4.1 Verify the Superiority of CR-FRMS

Firstly, the simulations in context of the system identification are carried out to illustrate the advantage of the proposed algorithm. The length of the unknown system generated randomly is 8 taps. It is assumed that the length of the unknown system is same as that of adaptive filter. Figure 1a, b depicts the performance of the CR-FRMS, FRMS [24], TSA [11], and LMS [13] algorithms under two background noise distributions, respectively, where $ p = 2 $ and $ \kappa = 8 \times 10^{ - 6} $. The step sizes of four different algorithms are set to 0.03 in two different mixed noise environments. Figure 1a shows the simulation results in a mixture of sub-Gaussian and super-Gaussian noises, and Fig. 1b shows the simulation results in a mixture of sub-Gaussian noise and impulsive noise [uniform and Laplace distribution are independent and identically distributed (i.i.d.) over time with zero mean]. Bernoulli–Gaussian (BG) process [41] is used frequently for modeling the impulsive noise $ \xi_{n} $, formulated as $ \xi_{n} = \tau_{n} \upsilon_{n} $, where $ \tau_{n} $ is a Bernoulli process with the probability of $ p\{ \tau_{n} = 1\} = 0.01 $ and $ \upsilon_{n} $ is i.i.d. zero-mean Gaussian sequence with variance $ \sigma = 10000 $. For a fair comparison of proposed algorithms, parameters are set so that the algorithms have a comparable initial convergence speed. As observed in Fig. 1a, b, the proposed CR-FRMS algorithm is superior to other existing algorithms. In addition, the calculation formula of MSD is defined as

$$ {\text{MSD}} = 10\log_{10} \left\| {{\mathbf{w}}_{o} - {\mathbf{w}}_{n} } \right\|^{2} $$

(71)

The second experiment is to test the convergence performance of CR-FRMS with different step sizes. Figure 2a, b illustrates the MSD learning curves with different step sizes of input signals for Gaussian white noise. As expected, when the fixed step size is applied in the CR-FRMS algorithm, there is a trade-off between the steady-state MSD and the rate of convergence. That is, a small step size corresponds to the lower steady-state error, although it slows down the convergence speed. In contrast, a large step size which is located in the stable range provides the higher the convergence speed, while it achieves large steady-state error.

The third experiment is to test the convergence performance of CR-FRMS with different parameters $ p $. Suppose that the unknown system has 8 coefficients. The driven signal is white Gaussian with unit variance. The filter length is 8. The step size of these algorithms is fixed to $ 3 \times 10^{ - 2} $, while $ p $ is set to different values. After a hundred times run, it can be seen from Fig. 3a that in the case of uniform distribution and Laplace distribution variances of 1 and 9, respectively, the larger the $ p $, the smaller the MSD, but the effect is not significant. However, as can be seen from Fig. 3b, in the case of mixed background noise with impulsive noise and sub-Gaussian noise (uniform distribution) with a signal-to-noise ratio (SNR) of 30, $ p = 2 $ obtains a better MSD when taking 1 or 3 with respect to $ p $.

4.2 Verify the Superiority of $ l_{0} $-CRPFRMS

The proposed $ l_{0} $-CRPFRMS is compared with the algorithms CR-FRMS, PNLMS [9] and $ l_{0} $-LMS [5] in two different background noise environments. The first one is a mixed noise with a uniform distribution with variance $ \sigma_{\text{u}} = 2.5 \times 10^{ - 5} $ and a Laplace distribution with variance $ \sigma_{\text{L}} = 0.01 $. The second is still mixed noise with a sub-Gaussian noise (uniform distribution) with signal-to-noise ratio of 30 dB and the impulsive noise. Besides, in two different mixed background noise environments, $ p = 2 $. Suppose that the unknown system has 64 coefficients, in which two of them are nonzero ones (their locations and values are randomly selected). The input signal is white Gaussian with unit variance. The filter length is 64. After fifty independent operations, their MSD curves are shown in Fig. 4a, b. It is evidently recognized that $ l_{0} $-CRPFRMS algorithm converges faster than its ancestor.

Comparing different algorithms is also needed to do experimental research on different values of parameter $ \kappa $. Theoretically, when $ \kappa $ is larger, the update weight will be closer to zero faster, that is, the convergence speed of the algorithm will be accelerated. However, the MSD of the algorithm will be increased. In this paper, the range of values of $ \kappa $ is relatively small. Therefore, as can be seen from Fig. 5a, b, the convergence speed of the algorithm is almost the same when $ \kappa $ is different, but MSD has obvious difference, that is, the smaller the $ \kappa $, the smaller the MSD.

5 Conclusion

In this paper, two algorithms based on censored regression, namely CR-FRMS and $ l_{0} $-CRPFRMS, are proposed under two different mixed background noises. CR-FRMS show superiority to LMS, TSA and FRMS in terms of MSD. Meanwhile, in the case where the unknown system is a sparse system, $ l_{0} $-CRPFRMS exhibits a faster convergence speed than CR-FRMS without changing the MSD. Since the two algorithms show different advantages, CR-FRMS has lower computational complexity and $ l_{0} $-CRPFRMS converges fast. Therefore, in real life, we must combine the actual requirements and choose a reasonable algorithm.

References

B. Allik, C. Miller, M.J. Piovoso, R. Zurakowski, Estimation of saturated data using the Tobit Kalman filter, in Proceedings of the American Control Conference (ACC), Portland, OR, USA, June (2014), pp. 4151–4156
J. Benesty, J. Chen, Y. Huang, Microphone Array Signal Processing, Ser. Springer Topics in Signal Processing, 2008th edn. (Springer, New York, NY, 2008)
Google Scholar
M. Bottai, J. Zhang, Laplace regression with censored data. Biom. J. 52(4), 487–503 (2010)
Article MathSciNet MATH Google Scholar
A.C. Cameron, P.K. Trivedi, Microeconometrics: Methods and Applications (Cambridge University Press, New York, NY, 2005)
Book MATH Google Scholar
Y. Chen, Y. Gu, A. Hero, Sparse LMS for system identification, in IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE (2009), pp. 3125–3128
H.C. Cho, S.W. Park, K.S. Lee, N.H. Kim, Anovel multiple-channel active noise control approach with neural secondary-path model for interior acoustic noise attenuation of railway train systems. IET Sig. Process. 6(8), 772–780 (2012)
Article Google Scholar
D. Collett, Modelling Survival Data in Medical Research, 2nd edn. (Chapman and Hall, London, 2003)
MATH Google Scholar
J. Cook, J. McDonald, Partially adaptive estimation of interval censored regression models. Comput. Econom. 42(1), 119–131 (2013)
Article Google Scholar
D. Ding, Z. Wang, B. Shen, H. Shu, H _∞ State estimation for discrete time complex networks with randomly occurring sensor saturations and randomly varying sensor delays. IEEE Trans. Neural Netw. 23(5), 725–736 (2012)
Article Google Scholar
D.L. Dunweiler, Proportionate normalized least mean square adaptation in echo cancelers. IEEE Trans. Speech Audio Process. 8, 508–518 (2000)
Article Google Scholar
E. Ertin, Gaussian process models for censored sensor readings, in Proceedings of the IEEE Statistical Signal Processing Workshop (SSP’07) (2007), pp. 665–669
Y. Gu, J. Jin, S. Mei, l ₀-Norm constraint LMS algorithm for sparse system identification. IEEE Signal Process. Lett. 16(9), 774–777 (2009)
Article Google Scholar
S.S. Haykin, Adaptive Filter Theory (Pearson Education, Chennai, 2008)
MATH Google Scholar
Z. He, Adaptive Signal Processing (Science Press, Beijing, 2003)
Google Scholar
J.J. Heckman, The common structure of statistical models of truncation, sample selection and limited dependent variables. Ann. Econ. Soc. Meas. 5(4), 311–801 (1976)
Google Scholar
O. Hoshuyama, A. Sugiyama, A. Hirano, A robust adaptive beamformer for microphone arrays with a blocking matrix using constrained adaptive filters. IEEE Trans. Signal Process. 47(10), 2677–2684 (1999)
Article Google Scholar
X. Jiang, W.J. Zeng, A. Yasotharan, H.C. So, T. Kirubarajan, Robust beamforming by linear programming. IEEE Trans. Signal Process. 62(7), 1834–1849 (2014)
Article MathSciNet MATH Google Scholar
J. Jin, B. Yang, K. Liang, X. Wang, General image denoising framework based on compressive sensing theory. Comput. Graph. 38, 382–391 (2014)
Article Google Scholar
M. Kawamoto, K. Matsuoka, N. Ohnishi, A method of blind separation for convolved non-stationary signals. Neurocomputing 22(1), 157–171 (1998)
Article MATH Google Scholar
T.W. Lee, M. Girolami, T. Sejnowski, Independent component analysis using an extended infomax algorithm for mixed sub-Gaussian and super-Gaussian sources. Neural Comput. 11(2), 409–433 (1999)
Article Google Scholar
T.W. Lee, M.S. Lewicki, Unsupervised image classification, segmentation, and enhancement using ICA mixture models. IEEE Trans. Image Process. 11(3), 270–279 (2002)
Article Google Scholar
Z. Liu, C. Li, Censored regression with noisy input. IEEE Trans. Signal Process. 63(19), 5071–5082 (2015)
Article MathSciNet MATH Google Scholar
Z. Liu, C. Li, Recursive least squares for censored regression. IEEE Trans. Signal Process. 65(6), 1565–1579 (2015)
Article MathSciNet MATH Google Scholar
Z. Liu, C. Li, Y. Liu, Distributed censored regression over networks. IEEE Trans. Signal Process. 63(20), 5437–5449 (2015)
Article MathSciNet MATH Google Scholar
W. Ma, B. Chen, H. Zhao, Sparse least logarithmic absolute difference algorithm with correntropy-induced metric penalty. Circuits Syst. Sig. Process. 35(3), 1077–1089 (2016)
Article MathSciNet Google Scholar
S. Maleki, G. Leus, Censored truncated sequential spectrum sensing for cognitive radio networks. IEEE J. Sel. Areas Commun. 31(3), 364–378 (2013)
Article Google Scholar
N.R. Mann, Methods for Statistical Analysis of Reliability and Life Data (Wiley, New York, NY, 1975)
Google Scholar
E.J. Msechu, G.B. Giannakis, Distributed measurement censoring for estimation with wireless sensor networks, in Proceedings of the Workshop on Signal Processing Advances in Wireless Communications (SPAWC), San Francisco, CA, USA (2011), pp. 176–180
J.L. Powell, Least absolute deviations estimation for the censored regression model. J. Econom. 25(3), 303–325 (1984)
Article MathSciNet MATH Google Scholar
A.H. Sayed, Fundamentals of Adaptive Filtering (Wiley, Hoboken, NJ, 2003)
Google Scholar
M.R. Segal, Regression trees for censored data. Biometrics 44(1), 35–47 (1988)
Article MATH Google Scholar
L.N. Sharma, S. Dandapat, A. Mahanta, ECG signal denoising using higher order statistics in wavelet subbands. Biomed. Signal Process. Control 5(3), 214–222 (2010)
Article Google Scholar
F. Shen, Adaptive Signal Processing (Xi’an University of Electronic Science and Technology Press, Xi’an, 2001)
Google Scholar
G. Su, J. Jin, Y. Gu, J. Wang, Performance analysis of l ₀-norm constraint least mean square algorithm. Ann. Econ. Soc. Meas. 60(5), 2223–2235 (2012)
MATH Google Scholar
G.A. Tsihrintzis, C.L. Nikias, Data-adaptive algorithms for signal detection in sub-Gaussian impulsive interference. IEEE Trans. Signal Process. 45(7), 1873–1878 (1997)
Article MATH Google Scholar
W. Wang, H. Zhao, B. Chen, Bias compensated zero attracting normalized least mean square adaptive filter and its performance analysis. Sig. Process. 143, 94–105 (2017)
Article Google Scholar
W. Wang, H. Zhao, B. Chen, Robust adaptive volterra filter under maximum correntropy criteria in impulsive environments. Circuits Syst. Sig. Process. 36(10), 4097–4117 (2017)
Article MATH Google Scholar
W. Wang, H. Zhao, K. Doğançay, Y. Yu, L. Lu, Z. Zheng, Robust adaptive filtering algorithm based on maximum correntropy criteria for censored regression. Sig. Process. 160, 88–98 (2019)
Article Google Scholar
L. Weruaga, S. Jimaa, Exact NLMS algorithm with, ℓp-norm constraint. Sig. Process. Lett. IEEE 22(3), 366–370 (2014)
Article Google Scholar
W. Ying, Y. Jiang, Y. Liu, P. Li, A blind receiver with multiple antennas in impulsive noise modeled as the sub-Gaussian distribution via the MCMC algorithm. IEEE Trans. Veh. Technol. 62(7), 3492–3497 (2013)
Article Google Scholar
Y. Yu, H. Zhao, B. Chen, Z. He, Two improved normalized subband adaptive filter algorithms with good robustness against impulsive interferences. Circuits Syst. Sig. Process. 35(12), 4607–4619 (2016)
Article MathSciNet MATH Google Scholar
X. Zhang, Z. Bao, Communication Signal Processing (Publishing of Defense Industry, Beijing, 2000)
Google Scholar
H. Zhang, H. Yan, F. Yang, Q. Chen, Distributed average filtering for sensor networks with sensor saturation. IET Control Theory Appl. 7(6), 887–893 (2013)
Article MathSciNet Google Scholar
S. Zhang, J. Zhang, New steady-state analysis results of variable step-size LMS algorithm with different noise distributions. IEEE Signal Process. Lett. 21(6), 653657 (2014)
Article Google Scholar
S. Zhang, W. Zheng, J. Zhang, H. Han, A family of robust M-shaped error weighted least mean square algorithms: performance analysis and echo cancelation application. IEEE Access 5, 14716–14727 (2017)
Article Google Scholar
F. Zhong, J. Zhang, Linear discriminant analysis based on L1-norm maximization. IEEE Trans. Image Process. 22(8), 3018–3027 (2013)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

This work was partially supported by National Science Foundation of P.R. China (Grant Nos. 61871461, 61571374 and 61433011), Sichuan Science and Technology Program (Grant No. 19YYJC0681).

Author information

Authors and Affiliations

Key Laboratory of Magnetic Suspension Technology and Maglev Vehicle, Ministry of Education, Chengdu, China
Feng Zhao, Haiquan Zhao & Wenyuan Wang
School of Electrical Engineering, Southwest Jiaotong University, Chengdu, China
Feng Zhao, Haiquan Zhao & Wenyuan Wang

Authors

Feng Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Haiquan Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Wenyuan Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haiquan Zhao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

For a random $ x $ with standard distribution, i.e., the cumulative distribution function satisfies the condition in Table 1, the conditional density function of conditional random variable denoted by is given by

$$ \begin{aligned} p(x|x > c) & = \frac{{{\text{d}}Pr(V < x|x > c)}}{{{\text{d}}x}} \\ & = \frac{\text{d}}{{{\text{d}}v}}\frac{{Pr(V < x\;{\text{and}}\;x > c)}}{Pr(x > c)} \\ & = \frac{\text{d}}{{{\text{d}}x}}\frac{{\int_{b}^{x} {\varphi (u){\text{d}}u} }}{Pr(x > c)} = \frac{\varphi (x)}{1 - \varPhi (c)} \\ \end{aligned} $$

where $ \varphi ( \cdot ) $ and $ \varPhi ( \cdot ) $ is expressed in Table 1. Using the probability theory, the conditional expectation $ E[x|x > c] $ can be computed by

$$ \begin{aligned} E[x|x > c] & = \int_{c}^{\infty } {xf(x|x > c){\text{d}}v} \\ & = \int_{c}^{\infty } {\frac{x\varphi (x)}{1 - \varPhi (x)}} {\text{d}}x = - \frac{1}{1 - \varPhi (c)}F(x)|_{c}^{\infty } \\ & = \frac{\varphi (c)}{1 - \varPhi (c)} = \frac{\varphi (c)}{\varPhi ( - c)} = \varPsi ( - c) \\ \end{aligned} $$

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Zhao, F., Zhao, H. & Wang, W. A Robust M-Shaped Error Weighted Algorithms for Censored Regression. Circuits Syst Signal Process 39, 324–343 (2020). https://doi.org/10.1007/s00034-019-01176-0

Download citation

Received: 24 December 2018
Revised: 10 June 2019
Accepted: 10 June 2019
Published: 25 June 2019
Issue Date: January 2020
DOI: https://doi.org/10.1007/s00034-019-01176-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Robust M-Shaped Error Weighted Algorithms for Censored Regression

Abstract

Similar content being viewed by others

A Review on Kalman Filter Models

Guided regularized random forest feature selection for smartphone based human activity recognition

Selecting critical features for data classification based on machine learning methods

1 Introduction

2 Description and Preliminaries

2.1 Problem Formulation

Remark 1

2.2 Review of FRMS Algorithm

3 Proposed New Algorithm

3.1 Proposed CR-FRMS Algorithm

Lemma

Proof

3.2 Proposed \( l_{0} \)-CRPFRMS algorithm

3.3 The Convergence Analysis of \( l_{0} \)-CRPFRMS Algorithm

Assumption 1

Assumption 2

4 Simulation

4.1 Verify the Superiority of CR-FRMS

4.2 Verify the Superiority of \( l_{0} \)-CRPFRMS

5 Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Robust M-Shaped Error Weighted Algorithms for Censored Regression

Abstract

Similar content being viewed by others

A Review on Kalman Filter Models

Guided regularized random forest feature selection for smartphone based human activity recognition

Selecting critical features for data classification based on machine learning methods

1 Introduction

2 Description and Preliminaries

2.1 Problem Formulation

Remark 1

2.2 Review of FRMS Algorithm

3 Proposed New Algorithm

3.1 Proposed CR-FRMS Algorithm

Lemma

Proof

3.2 Proposed \( l_{0} \)-CRPFRMS algorithm

3.3 The Convergence Analysis of \( l_{0} \)-CRPFRMS Algorithm

Assumption 1

Assumption 2

4 Simulation

4.1 Verify the Superiority of CR-FRMS

4.2 Verify the Superiority of \( l_{0} \)-CRPFRMS

5 Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation