1 Introduction

Linear regression models are widely used for signal processing, communication, and many other yields, assuming that the observed data is fully available. To solve this model problem, the researchers have proposed a large number of adaptive algorithms, such as least mean square (LMS) algorithm [13], normalized LMS (NLMS) algorithm, affine projection algorithm (APA) [14, 30, 33, 42] and so on.

Unfortunately, the requirement for linear regression may not be usually met in many practical applications. In general, output data whose value exceeds the limit of the recording device cannot be observed [3, 15, 29]. In other words, the output data whose values lie in a certain range are available. This situation occurs in economics [4], statistics [11, 27], engineering applications [28, 26], and medical research [7, 31]. In some systems, due to sensors’ saturation characteristics [1, 9, 43], it is not possible to collect complete data very efficiently. For example, in microphone array signal processing [2, 16], when the amplitude of speech signal exceeds a certain amplitude threshold of the microphones, it is cut off and the signal wave shape may be distorted to be flat because the signal’s positive and negative peaks exceeding the threshold are lost, or censored. Actually, the censored regression can be seen as a nonlinear regression model which includes a saturated nonlinearity model and linear system [23, 38]. Since the output data of the censored regression may lose significant information, using the traditional algorithms to identify this type of model may result in biased and wrong estimates [22]. Recently, in an attempt to deal with the censored regression problem, numerous algorithms have been proposed, such as maximum likelihood (ML) methods [8], two-step estimator [15], least absolute deviation [29] To solve online censored regression problems, Liu et al. [24] proposed the adaptive Heckman two-step algorithm (TSA) which significantly outperforms the conventional adaptive algorithms while the output data is censored.

As is well known, most of the adaptive algorithms are based on Gaussian noise environment. However, real-world signals often exhibit non-Gaussian properties. For example, in the beamforming, sub-Gaussian (light-tailed) signals are frequently encountered [17]. In some active noise control (ANC) applications, mechanical friction, vibration noise, and speech signal are super-Gaussian/impulsive (heavy-tailed) signals [39]. In the blind source separation (BSS), adaptive receivers with multiple antennas, and image denoising, the signal may be the mixed sub-Gaussian and super-Gaussian/impulsive signal [18,19,20,20, 32, 35, 40, 46].

In an attempt to improve the robustness of the algorithm in mixed non-Gaussian background noise environments, a family of robust M-shaped (FRMS) functions was applied to the adaptive algorithms [45].

Furthermore, when the system exhibits a certain degree of sparsity, the appeal algorithm cannot utilize the characteristics of the system. To deal with this issue, proportional algorithms [10] and multiple norm forms of zero-attraction algorithms, such as \( l_{0} \)-norm, \( l_{1} \)-norm, and \( l_{p} \)-norm [5, 12, 25, 34, 37, 39, 46], are applied to this type of system. In fact, the ideal sparse measure is the \( l_{0} \)-norm that counts the number of nonzero components. Therefore, the \( l_{0} \)-norm constraint is adopted in this paper.

In this paper, contributions are made as follows:

  1. (i)

    For the first time, a family of robust M-shaped algorithm for mixed non-Gaussian background noises under the censored regression model is proposed. The algorithm not only can effectively compensate the output signal, but also can deal with the effects of sub-Gaussian and super-Gaussian/impulsive noises.

  2. (ii)

    For the characteristics of sparse system, the \( l_{0} \)-norm proportional FRMS algorithm under the censored regression model (\( l_{0} \)-CRPFRMS) is also proposed for the first time.

  3. (iii)

    Simulation examples to demonstrate the performance of the proposed algorithm in non-Gaussian background noise environments.

2 Description and Preliminaries

2.1 Problem Formulation

Consider the following linear regression model

$$ \hat{d}_{n} = {\mathbf{u}}_{n}^{\text{T}} {\mathbf{w}}_{o} + \eta_{n} + \xi_{n} $$
(1)

where \( {\mathbf{w}}_{o} \) is the unknown column vector of size, \( \eta_{n} \) denotes the background noise with zero mean and variance \( \sigma_{i}^{2} \). In this paper, sub-Gaussian and super-Gaussian noises are involved, and \( \xi_{n} \) represents the impulsive noise with zero mean. When the data \( \hat{d}_{n} \) and \( {\mathbf{u}} \) are completely observed, using some traditional methods, the unknown vector \( {\mathbf{w}}_{o} \) can be identified easily. However, the data \( \hat{d}_{n} \) may be not completely observed in some practical application. The censored output \( d_{n} \) can be formulated by

$$ d_{n} = ({\mathbf{u}}_{n}^{\text{T}} {\mathbf{w}}_{o} + \eta_{n} )_{ + } = (\hat{d}_{n} )_{ + } $$
(2)

where \( (\hat{d}_{n} )_{ + } = \hbox{max} \{ 0,\hat{d}_{n} \} \).

Remark 1

In this paper, although the left-censored to 0 is only applied to the algorithm, the right-censored and both-sides-censored cases can also be applied in the algorithm, and the processing method is similar. In particular, if the output \( \hat{d}_{n} \) is left-censored or right-censored to a constant \( c \), the censored output \( d_{n} \) can be expressed as

$$ d_{n} = \hbox{max} \{ c,\hat{d}_{n} \} = (\hat{d}_{n} - c)_{ + } + c $$
(3)

and

$$ d_{n} = \hbox{min} \{ c,\hat{d}_{n} \} = c - (c - \hat{d}_{n} )_{ + } $$
(4)

respectively.

2.2 Review of FRMS Algorithm

In [23], the authors divide the existing error nonlinear adaptive algorithms based on LMS into three categories, V-shaped, \( \varLambda \)-shaped, and M-shaped algorithms. After comparing the advantages and disadvantages of each algorithm, the FRMS function was proposed, where its weighted function is

$$ f(e_{n} ) = \frac{{|e_{n} |^{p} }}{{\varsigma + |e_{n} |^{p + 1} }},p > 0 $$
(5)

where \( \varsigma > 0 \) is the parameter. For instance, when \( \varsigma \to 0 \), it will be a \( \varLambda \)-shaped algorithm. (\( \varLambda \)-shaped algorithm is applicable to the super-Gaussian noise than that for sub-Gaussian noise environment.) When \( \varsigma \to \infty \), it is a V-shaped algorithm. (V-shaped algorithm is more suitable for sub-Gaussian noise.) In the proposed robust M-shaped function, its denominator is with an order higher than the numerator. Using the gradient descent, this algorithm essentially solves the stochastic cost function

$$ J(e_{n} ) = \int_{0}^{{e_{n} }} {\frac{{x|x|^{p} }}{{\varsigma + |x|^{p + 1} }}} {\text{d}}x $$
(6)

which is positive semi-definite for \( \varsigma > 0 \); p > 0. Thus, the proposed robust M-shaped algorithm will not suffer from the local minimum problem.

3 Proposed New Algorithm

3.1 Proposed CR-FRMS Algorithm

Since the observations in this paper are assumed to be censored, the FRMS algorithm would obtain a biased estimate which is called sample selection bias [34, 44]. In other words, when \( \hat{d}_{n} < 0 \), the data \( \hat{d}_{n} \) is missing, which leads to the bias and the inequality \( E[d_{n} |{\mathbf{u}}_{n}^{\text{T}} {\mathbf{w}}_{o} ] \ne {\mathbf{u}}_{n}^{\text{T}} {\mathbf{w}}_{o} \). To compensate the bias, firstly, it is necessary to correct the sample selection bias using FRMS in censored observations, which is inspired by the Heckman two-stage approach [15]. Due to the left-censored property of \( d_{n} \), only positive values of \( d_{n} \) can be correctly obtained. Recalling (1) and noting that the background noise \( \eta_{n} \) and impulsive noise \( \xi_{n} \) are both zero-mean signals, the expectation of \( d_{n} \) under the condition \( d_{n} > 0 \) can be expressed by

$$ \begin{aligned} & E[d_{n} |{\mathbf{u}}_{n}^{\text{T}} {\mathbf{w}}_{o} ,d_{n} > 0] \\ & \quad = E[\hat{d}_{n} |{\mathbf{u}}_{n}^{\text{T}} {\mathbf{w}}_{o} ,\hat{d}_{n} > 0] \\ & \quad = {\mathbf{u}}_{n}^{T} {\mathbf{w}}_{o} + E[\eta_{n} + \xi_{n} |\eta_{n} + \xi_{n} > - {\mathbf{u}}_{n}^{\text{T}} {\mathbf{w}}_{o} ]. \\ \end{aligned} $$
(7)

Since the impulsive noise occurs with a very low probability, the following approximation is reasonable,

$$ E[\eta_{n} + \xi_{n} |\eta_{n} + \xi_{n} > - {\mathbf{u}}_{n}^{\text{T}} {\mathbf{w}}_{o} ] \approx E[\eta_{n} |\eta_{n} > - {\mathbf{u}}_{n}^{\text{T}} {\mathbf{w}}_{o} ] $$
(8)

Before calculating the last term of (5), the following lemma is introduced.

Lemma

The condition expectation \( E[x|x > - c] \) satisfies

$$ E[x|x > - c] = \frac{\varphi (c)}{{\Phi (c)}} = \varOmega (c). $$
(9)

Proof

See “Appendix” for detail.

Using the lemma, we have

$$ E[\eta_{n} |\eta_{n} > - {\mathbf{u}}_{n}^{\text{T}} {\mathbf{w}}_{o} ] = \sigma_{n} \varOmega ({\mathbf{u}}_{n}^{\text{T}} {\varvec{\upalpha}}) $$
(10)

with \( \varphi ( \cdot ) \) and \( \varPhi ( \cdot ) \) given in Table 1. In addition, the vector \( {\varvec{\upalpha}} \) is given by

$$ {\varvec{\upalpha}} = \frac{{{\mathbf{w}}_{o} }}{{\sigma_{i} }} $$
(11)
Table 1 Probability density function and distribution function of three different noises

Then, using (9) and the probability theory yields

$$ \begin{aligned} E[d_{n} |{\mathbf{u}}_{n}^{\text{T}} {\mathbf{w}}_{o} ] & = Pr(d_{n} > 0)E[d_{n} |{\mathbf{u}}_{n}^{\text{T}} {\mathbf{w}}_{o} ,d_{n} > 0] \\ & = \varPhi ({\mathbf{u}}_{n}^{\text{T}} {\varvec{\upalpha}}){\mathbf{u}}_{n}^{\text{T}} {\mathbf{w}}_{o} + \sigma_{i} \varphi ({\mathbf{u}}_{n}^{\text{T}} {\varvec{\upalpha}}) \\ \end{aligned} $$
(12)

where the second equation comes from the fact that the probability of \( d_{n} > 0 \) is equal to \( \varPhi ({\mathbf{u}}_{n}^{\text{T}} {\varvec{\upalpha}}) \), i.e., \( Pr(d_{n} > 0) = {\varPhi (}{\mathbf{u}}_{n}^{\text{T}} {\varvec{\upalpha}}) \). According to (12), the censored regression model (2) can be expressed as:

$$ d_{n} = {\varPhi} ({\mathbf{u}}_{n}^{\text{T}} {\varvec{\upalpha}}){\mathbf{u}}_{n}^{\text{T}} {\mathbf{w}}_{o} + \sigma_{i} \varphi ({\mathbf{u}}_{n}^{\text{T}} {\varvec{\upalpha}}) + v_{n} $$
(13)

where \( v_{n} \) is the random variable with zero mean, i.e.,

$$ E[v_{n} |{\mathbf{u}}_{n}^{\text{T}} {\mathbf{w}}_{o} ] = 0. $$
(14)

Since the algorithm based on MSE criterion cannot estimate \( {\mathbf{w}}_{o} \) correctly, the cost function based on FRMS is adopted

$$ \zeta_{\text{FRMS}} = E\left( {\frac{{|\varepsilon_{n} ({\mathbf{w}},\sigma_{n} ,{\varvec{\upalpha}})|^{p} }}{{\varsigma + |\varepsilon_{n} ({\mathbf{w}},\sigma_{n} ,{\varvec{\upalpha}})|^{p + 1} }}} \right) $$
(15)

where

$$ \varepsilon_{n} ({\mathbf{w}},\sigma_{n,} {\varvec{\upalpha}}) = d_{n} { - \varPhi (}{\mathbf{u}}_{n}^{\text{T}} {\varvec{\upalpha}}){\mathbf{u}}_{n}^{\text{T}} {\mathbf{w}} + \sigma_{i} \varphi ({\mathbf{u}}_{n}^{\text{T}} {\varvec{\upalpha}}). $$
(16)

Obviously, to estimate \( {\mathbf{w}}_{o} \), the estimation for \( {\varvec{\upalpha}} \) and \( \sigma_{n} , \) should also be available. In the sequel, an indicator variable \( a_{n} \) is introduce to estimate \( {\varvec{\upalpha}} \),

$$ a_{n} = \left\{ {\begin{array}{*{20}l} {1,} \hfill & {{\text{if}}\;d_{n} > 0} \hfill \\ {0,} \hfill & {\text{otherwise}} \hfill \\ \end{array} } \right. $$
(17)

The probabilities of \( a_{n} \) is expressed as

$$ Pr(a_{n} = 1) = \varPhi ({\mathbf{u}}_{n}^{\text{T}} {\varvec{\upalpha}}) $$
(18)
$$ Pr(a_{n} = 0) = \varPhi ( -{\mathbf{u}}_{n}^{\text{T}} {\varvec{\upalpha}}). $$
(19)

Then, the following optimization problem is considered to estimate \( {\varvec{\upalpha}} \) [22],

$$ {\boldsymbol{\hat{\alpha}}} = \arg \mathop {\hbox{max} }\limits_{{{\boldsymbol{\bar{\alpha }}}}} = \varGamma ({\boldsymbol{\bar{\alpha }}}) = \arg \mathop {\hbox{max} }\limits_{{{\boldsymbol{\bar{\alpha }}}}} E[\varGamma_{n} ({\boldsymbol{\bar{\alpha }}})] $$
(20)

where

$$ \varGamma ({\boldsymbol{\bar{\alpha }}}) = E[\varGamma_{n} ({\boldsymbol{\bar{\alpha }}})] $$
(21)

and

$$ \varGamma_{n} ({\boldsymbol{\bar{\alpha }}}) = \log (Pr(d_{n} |{\mathbf{u}}_{n} ,{\varvec{\upalpha}})) $$
(22)

with

$$ Pr(d_{n} |{\mathbf{u}}_{n} ,{\varvec{\upalpha}}) = \varPhi ({\mathbf{u}}_{n}^{\text{T}} {\varvec{\upalpha}})^{{a_{n} }} \varPhi ( -{\mathbf{u}}_{n}^{\text{T}} {\varvec{\upalpha}})^{{1 - a_{n} }} . $$
(23)

In the sequel, using the steepest ascent principle yields [22]

$$ \begin{aligned} {\boldsymbol{\hat{\alpha }}}_{n} & = {\boldsymbol{\hat{\alpha }}}_{n - 1} + \mu \frac{{\partial \varGamma_{n} ({\varvec{\upalpha}})}}{{\partial {\varvec{\upalpha}}}}|_{{{\boldsymbol{\hat{\alpha }}}_{n - 1} }} \\ & = {\boldsymbol{\hat{\alpha }}}_{n - 1} + \mu [a_{n} \varOmega (u_{n}^{\text{T}} {\boldsymbol{\hat{\alpha }}}_{n - 1} ){\mathbf{u}}_{n} - (1 - a_{n} )\varOmega ( - {\mathbf{u}}_{n}^{\text{T}} {\boldsymbol{\hat{\alpha }}}_{n - 1} ){\mathbf{u}}_{n} ]. \\ \end{aligned} $$
(24)

Then, the estimation for \( \sigma_{n} \) is considered. Using the decent method and the cost function \( f(\varepsilon_{n} ({\mathbf{w}},\sigma_{n,} {\varvec{\upalpha}})) \) with respected to \( \sigma_{n,i - 1} \), we have

$$ \begin{aligned} \hat{\sigma }_{i,n} & = \hat{\sigma }_{i,n - 1} - \frac{\mu }{2}\frac{{\partial f(\varepsilon_{n} ({\mathbf{w}},\sigma_{n,} {\varvec{\upalpha}}))}}{{\partial \sigma_{i,} }}|_{{\hat{\sigma }_{i,n - 1} }} \\ & = \hat{\sigma }_{i,n - 1} + \mu \varphi ({\mathbf{u}}_{n}^{\text{T}} {\boldsymbol{\hat{\alpha }}}_{n} )f(\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n - 1} ,{\boldsymbol{\hat{\alpha }}}_{n} ))\varepsilon_{n} ({\mathbf{w}},\sigma_{n,i - 1} ,{\boldsymbol{\hat{\alpha }}}_{n} ) \\ \end{aligned} $$
(25)

where

$$ f(\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n - 1} ,{\boldsymbol{\hat{\alpha }}}_{n - 1} )) = \frac{{|d_{n} - \varPhi ({\mathbf{u}}_{n}^{\text{T}} {\varvec{\upalpha}}){\mathbf{u}}_{n}^{\text{T}} {\mathbf{w}} + \sigma_{i} \varphi ({\mathbf{u}}_{n}^{\text{T}} {\varvec{\upalpha}})|^{p} }}{{\varsigma + |d_{n} - \varPhi ({\mathbf{u}}_{n}^{\text{T}} {\varvec{\upalpha}}){\mathbf{u}}_{n}^{\text{T}} {\mathbf{w}} + \sigma_{i} \varphi ({\mathbf{u}}_{n}^{\text{T}} {\varvec{\upalpha}})|^{p + 1} }}. $$
(26)

Similarly, the weight update formula can be obtained by the gradient descent method

$$ \begin{aligned} {\mathbf{w}}_{n} & = {\mathbf{w}}_{n - 1} - \mu \frac{{\partial \zeta_{\text{FRMS}} ({\mathbf{w}})}}{{\partial {\mathbf{w}}}}|_{{{\mathbf{w}}_{n - 1} }} \\ & = {\mathbf{w}}_{n - 1} + \mu\Phi _{n} ({\mathbf{u}}_{n}^{\text{T}} {\boldsymbol{\hat{\alpha }}}_{n} ){\mathbf{u}}_{n}^{\text{T}} f(\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n} ,{\boldsymbol{\hat{\alpha }}}_{n} ))\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n} ,{\boldsymbol{\hat{\alpha }}}_{n} ) \\ \end{aligned} $$
(27)

where

$$ \zeta_{NFRMS} ({\mathbf{w}}) = f(\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n - 1} ,{\boldsymbol{\hat{\alpha }}}_{n - 1} )). $$
(28)

3.2 Proposed \( l_{0} \)-CRPFRMS algorithm

In many practical applications, such as digital TV transmission channels and echo paths, the systems that need to be identified are sparse. However, the algorithm in Sect. 3.1 does not make full use of the characteristics of the sparse system. Therefore, this section proposes a \( l_{0} \)-norm proportional FRMS algorithm based on the censored regression model (\( l_{0} \)-CRPFRMS).

Sparse system is defined whose impulse response contains many near-zero coefficients and few large ones. In [36], the author first proposed a proportional idea, the (PNLMS) algorithm. Its iteration formula is as follows:

$$ \hat{y}(k) = \sum\limits_{n = 0}^{N - 1} {\hat{h}_{n} (k)x(k - n)} $$
(29)
$$ e(k) = y(k) - \hat{y}(k) $$
(30)
$$ l_{\infty } (k) = \hbox{max} \{ |\hat{h}_{0} (k)|, \ldots ,|\hat{h}_{N - 1} (k)|\} $$
(31)
$$ l^{\prime}_{\infty } (k) = \hbox{max} \{ \delta ,l_{\infty } (k)\} $$
(32)
$$ g_{n} (k) = \hbox{max} \{ \rho l^{\prime}_{\infty } (k),|\hat{h}_{n} (k)|\} $$
(33)
$$ \bar{g}(k) = \frac{1}{N}\sum\limits_{n = 0}^{N - 1} {g_{n} (k)} $$
(34)
$$ \hat{\sigma }_{x}^{2} (k) = \frac{1}{N}\sum\limits_{n = 0}^{N - 1} {x^{2} (k - n)} $$
(35)
$$ \hat{h}_{n} (k + 1) = \hat{h}_{n} (k) + \frac{\mu }{N}\frac{{g_{n} (k)}}{{\bar{g}(k)}}\frac{e(k)x(k - n)}{{\hat{\sigma }_{x}^{2} (k)}} . $$
(36)

where \( x( \cdot ) \) is input signal, \( \hat{y}( \cdot ) \) is output signal and \( y( \cdot ) \) is expected signal, and \( \hat{h}_{n} ( \cdot ) \) is tap weight. The parameters \( \rho \) and \( \delta \) affect small-signal regularization. In (32), \( \rho \) prevents \( \hat{h}_{n} (k + 1) \) from stalling when it is much smaller than the largest coefficient and \( \delta \) regularizes the updating when all coefficients are zero at initialization. It can be seen from (3134, 36) that when the tap weight is closer to zero, the \( g_{n} (k)/\bar{g}(k) \) item will become smaller and smaller. On the contrary, when the tap weight is far from zero, more energy will be gained in iteration. In this way, the aim of proportional algorithm is achieved, that is, the convergence speed of the algorithm is accelerated without changing the steady-state mean square deviation (MSD).

In order to further accelerate the convergence speed of the algorithm better, this section adds another method for sparse systems, i.e.,\( l_{0} \)-norm constraint [12, 34], on the premise of adding proportional algorithm. As the name implies, it is to add a \( \gamma ||{\mathbf{w}}||_{0} \) to the original cost function. In this section, the cost function is changed into

$$ \vartheta (n) = \int_{0}^{{e_{n} }} {\frac{{x|x|^{p} }}{{\varsigma + |x|^{p + 1} }}} {\text{d}}x + \gamma ||{\mathbf{w}}_{n} ||_{0} . $$
(37)

where \( \gamma > 0 \) is a factor to balance the new penalty and the estimation error. In [12], in order to reduce the computational complexity, the author firstly approximates \( ||{\mathbf{w}}_{n} ||_{0} \) to \( \sum\nolimits_{i}^{L - 1} {(1 - {\text{e}}^{{ - \beta |w_{n} (i)|}} )} \), then uses gradient descent method to derive the cost function, and then uses the Taylor formula to perform a first-order expansion on the zero attracting term in the iterative formula, and finally obtains the weight update formula with lower computational complexity,

$$ {\mathbf{w}}_{n} = {\mathbf{w}}_{n - 1} + \;{\text{gradient}}\;{\text{correction}} + {\text{zero}}\;{\text{attraction}} $$
(38)

where zero attraction means \( \kappa f_{\beta } ({\mathbf{w}}_{n} ) \cdot \kappa = \mu \gamma \) is a positive constant and \( f_{\beta } ( \cdot ) \) is defined as

$$ f_{\beta } (x) = \left\{ {\begin{array}{*{20}l} {\beta^{2} x + \beta ,} \hfill & { - \tfrac{1}{\beta } \le x \le 0} \hfill \\ {\beta^{2} x - \beta ,} \hfill & {0 \le x \le \tfrac{1}{\beta }} \hfill \\ {0,} \hfill & {\text{elsewhere}} \hfill \\ \end{array} } \right.. $$
(39)

The parameter \( \beta \) is set to 5 in this paper. From (39), it can be seen that when the coefficients are in the range of \( ( - 1/\beta ,1/\beta ) \), they will be constantly attracted to zero, and when the coefficients are not in the range, there will be no additional attraction, which will improve the convergence speed of those coefficients close to zero, and the overall convergence speed will be accelerated.

As in Sect. 3.1, primarily, it is necessary to process the output signal to get Eq. (13). Then, the error can be expressed as Eq. (16). Next, it is also needed to process \( {\varvec{\upalpha}} \) and then use the steepest descent method to get Eq. (24). Again, the next step is to estimate the parameter \( \sigma_{n} \). That is, the cost function based on \( l_{0} \)-CRPFRMS is derived and obtained.

$$ \begin{aligned} \hat{\sigma }_{i,n} & = \hat{\sigma }_{i,n - 1} - \frac{\mu }{2}\frac{{\partial \vartheta_{{l_{0} {\text{-CRPFRMS}}}} ({\mathbf{w}},\sigma_{i,n - 1} ,{\varvec{\upalpha}})}}{{\partial \sigma_{i,} }}|_{{\hat{\sigma }_{i,n - 1} }} \\ & = \hat{\sigma }_{i,n - 1} + \mu \varphi ({\mathbf{u}}_{n}^{\text{T}} {\boldsymbol{\hat{\alpha }}}_{n} )f(\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n - 1} ,{\boldsymbol{\hat{\alpha }}}_{n} ))\varepsilon_{n} ({\mathbf{w}},\sigma_{n,i - 1} ,{\boldsymbol{\hat{\alpha }}}_{n} ). \\ \end{aligned} $$
(40)

Similarly, the weight update formula can be obtained as follows

$$ \begin{aligned} {\mathbf{w}}_{n} & = {\mathbf{w}}_{n - 1} + \mu G_{n - 1}\Phi _{n} ({\mathbf{u}}_{n}^{\text{T}} {\boldsymbol{\hat{\alpha }}}_{n} )u_{n}^{\text{T}} f(\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n} ,{\boldsymbol{\hat{\alpha }}}_{n} ))\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n} ,{\boldsymbol{\hat{\alpha }}}_{n} ) \\ & \quad - \kappa f_{\beta } ({\mathbf{w}}_{n - 1} ) \\ \end{aligned} $$
(41)

where

$$ {\mathbf{w}}_{n} = (w_{n} (0), \ldots ,w_{n} (L - 1)) $$
(42)
$$ {\mathbf{G}}_{n - 1} = {\text{diag}}\{ g_{n - 1} (0), \ldots ,g_{n - 1} (L - 1)\} . $$
(43)

The diagonal elements of \( {\mathbf{G}}_{n - 1} \) are calculated as follows:

$$ g_{n - 1} (l) = \frac{{\theta_{n - 1} (l)}}{{\sum\nolimits_{l = 0}^{L - 1} {\theta_{n - 1} (i)} }},\quad 0 \le l \le L - 1 $$
(44)
$$ \theta_{n - 1} (l) = \hbox{max} \{ \rho \hbox{max} [|w_{n - 1} (0)|, \ldots ,|w_{n - 1} (L - 1)|],|w_{n - 1} (l)|\} $$
(45)

where \( \rho = 5/L \).

3.3 The Convergence Analysis of \( l_{0} \)-CRPFRMS Algorithm

The steady-state performance of \( l_{0} \)-CRPFRMS algorithm is analyzed in the following part. To make the analysis tractable, the following assumptions are given, which are commonly used in the analysis of adaptive filtering algorithm [3, 13].

Assumption 1

The noise \( \eta_{n} \) is independent of the input signal. The impulsive noise \( \xi_{n} \) doesn’t occur.

Assumption 2

The weight error vector \( {\boldsymbol{\hat{w}}}_{n} = {\mathbf{w}}_{o} - {\mathbf{w}}_{n} \) is independent of the input signal.

Obviously, (24) can be rewritten

$$ {\boldsymbol{\hat{\alpha }}}_{n} = {\boldsymbol{\hat{\alpha }}}_{n - 1} + \mu \varGamma_{n}^{\prime } ({\boldsymbol{\hat{\alpha }}}_{n - 1} ). $$
(46)

Then, the Taylor expansion of \( \varGamma_{n}^{\prime } ({\boldsymbol{\hat{\alpha }}}_{n - 1} ) \) at the point \( {\boldsymbol{\hat{\alpha }}}_{n - 1} = {\mathbf{ \alpha }} \) is given by,

$$ \begin{aligned} \varGamma_{n}^{\prime } ({\boldsymbol{\hat{\alpha }}}_{n - 1} ) & = \varGamma_{n}^{\prime } ({\boldsymbol{\hat{\alpha }}}) + \varGamma_{n}^{\prime \prime } ({\boldsymbol{\hat{\alpha }}}_{n - 1} )({\boldsymbol{\hat{\alpha }}}_{n - 1} - {\boldsymbol{\hat{\alpha }}}) \\ & = \varGamma_{n}^{\prime } ({\boldsymbol{\hat{\alpha }}}) - {\boldsymbol{\rm Z}}_{n} ({\boldsymbol{\hat{\alpha }}}_{n - 1} - {\boldsymbol{\hat{\alpha }}}) \\ \end{aligned} $$
(47)

where \( {\boldsymbol{\rm Z}}_{n} = \varGamma_{n}^{\prime \prime } ({\boldsymbol{\hat{\alpha }}}_{n - 1} ) \). Letting \( {\boldsymbol{\tilde{\alpha }}}_{n - 1} = {\varvec{\upalpha}} - {\boldsymbol{\hat{\alpha }}}_{n - 1} \), subtracting both sides of (46) by \( {\varvec{\upalpha}} \) and using (47) yields

$$ {\boldsymbol{\tilde{\alpha }}}_{n} = {\boldsymbol{\tilde{\alpha }}}_{n - 1} - \mu \varGamma_{n}^{\prime } ({\boldsymbol{\hat{\alpha }}}_{n - 1} ) = ({\mathbf{I}}_{M} + \mu {\boldsymbol{\rm Z}}_{n} ){\boldsymbol{\tilde{\alpha }}}_{n - 1} - \mu \varGamma_{n}^{\prime } ({\varvec{\upalpha}}). $$
(48)

Using the Assumptions A1–A2 and taking expectation both sides of (4) leads to

$$ E[{\boldsymbol{\tilde{\alpha }}}_{n} ] = ({\mathbf{I}}_{M} + \mu E[{\boldsymbol{\rm Z}}_{n} ])E[{\boldsymbol{\tilde{\alpha }}}_{n - 1} ] - \mu E[\varGamma_{n}^{\prime } ({\varvec{\upalpha}})]. $$
(49)

According to [24], \( E[{\boldsymbol{\rm Z}}_{n} ] \) is the hessian matrix of \( \varGamma_{n} ({\boldsymbol{\hat{\alpha }}}_{n - 1} ) \) and is negative definite. In the sequel, taking the expectation \( \varGamma_{n} ({\varvec{\upalpha}}) \) with only \( v_{n} \) results in

$$ E_{v} [a_{n} ] = Pr(a_{n} = 1) = \varPhi (u_{n}^{\text{T}} {\boldsymbol{\bar{\alpha }}}). $$
(50)

Using (50) and Assumptions A1–A2 yields (see [24] for detail)

$$ E[\varGamma_{n}^{\prime } ({\varvec{\upalpha}})] = 0. $$
(51)

Substituting (52) into (50) yields

$$ E[{\boldsymbol{\tilde{\alpha }}}_{n} ] = ({\mathbf{I}}_{M} + \mu E[{\boldsymbol{\rm Z}}_{n} ])E[{\boldsymbol{\tilde{\alpha }}}_{n - 1} ]. $$
(52)

To guarantee the stability in the mean sense, the matrix should be stable. Note that the negative-definite has negative eigenvalues, and hence, the step size should be selected according to

$$ 0 < \mu < - \frac{2}{{\lambda_{\hbox{min} } (E[{\boldsymbol{\rm Z}}_{n} ])}}. $$
(53)

Under this condition, we have

$$ E[{\boldsymbol{\tilde{\alpha }}}_{\infty } ] = {\varvec{\upalpha}}. $$
(54)

In the other words, Eq. (24) is an unbiased estimate of \( {\varvec{\upalpha}} \) if the proposed algorithm is stable. Then, (40) and (41) can be rewritten, respectively,

$$ \begin{aligned} \hat{\sigma }_{i,n} & = \hat{\sigma }_{i,n - 1} - \frac{\mu }{2}\frac{{\partial \vartheta_{{l_{0} {\text{-CRPFRMS}}}} ({\mathbf{w}},\sigma_{i,n - 1} ,{\boldsymbol{\hat{\alpha }}}_{n} )}}{{\partial \sigma_{i,} }}|_{{\hat{\sigma }_{i,n - 1} }} \\ & = \hat{\sigma }_{i,n - 1} + \mu \varphi ({\mathbf{u}}_{n}^{\text{T}} {\boldsymbol{\hat{\alpha }}}_{n} )f(\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n - 1} ,{\boldsymbol{\hat{\alpha }}}_{n} ))\varepsilon_{n} ({\mathbf{w}},\sigma_{n,i - 1} ,{\boldsymbol{\hat{\alpha }}}_{n} ) \\ \end{aligned} $$
(55)
$$ \begin{aligned} {\mathbf{w}}_{n} & = {\mathbf{w}}_{n - 1} + \mu G_{n - 1} \varPhi_{n} ({\mathbf{u}}_{n}^{\text{T}} {\boldsymbol{\hat{\alpha }}}_{n} )u_{n}^{\text{T}} f(\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n} ,{\boldsymbol{\hat{\alpha }}}_{n} ))\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n} ,{\boldsymbol{\hat{\alpha }}}_{n} ) \\ & \quad - \kappa f_{\beta } ({\mathbf{w}}_{n - 1} ) \\ & \approx {\mathbf{w}}_{n - 1} + \mu G_{n - 1} \varPhi_{n} ({\mathbf{u}}_{n}^{\text{T}} {\boldsymbol{\hat{\alpha }}}_{n} )u_{n} f(\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n - 1} ,{\boldsymbol{\hat{\alpha }}}_{n} ))\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n - 1} ,{\boldsymbol{\hat{\alpha }}}_{n} ) \\ & \quad - \kappa f_{\beta } ({\mathbf{w}}_{n - 1} ). \\ \end{aligned} $$
(56)

Combining (55) and (56) gives

$$ \begin{aligned} {\boldsymbol{\hat{\theta }}}_{n} & = \left[ {\begin{array}{*{20}c} {{\mathbf{w}}_{n} } \\ {\hat{\sigma }_{i,n} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {{\mathbf{w}}_{n - 1} } \\ {\hat{\sigma }_{i,n - 1} } \\ \end{array} } \right] + \mu f(\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n - 1} ,{\boldsymbol{\hat{\alpha }}}_{n} ))\varepsilon_{n} ({\mathbf{w}},\sigma_{n,i - 1} ,{\boldsymbol{\hat{\alpha }}}_{n} ){\boldsymbol{\bar{G}}}_{n - 1} {\boldsymbol{\hat{h}}}_{n} \\ & \quad - \left[ {\begin{array}{*{20}c} 0 \\ {\kappa f_{\beta } ({\mathbf{w}}_{n - 1} )} \\ \end{array} } \right] \\ \end{aligned} $$
(57)

where

$$ {\boldsymbol{\bar{G}}}_{n - 1} = \left[ {\begin{array}{*{20}c} {{\mathbf{G}}_{n - 1} } & 0 \\ 0 & 1 \\ \end{array} } \right] $$
(58)
$$ {\boldsymbol{\hat{h}}}_{n} = \left[ {\begin{array}{*{20}c} {\varPhi_{n} ({\mathbf{u}}_{n}^{\text{T}} {\boldsymbol{\hat{\alpha }}}_{n} )u_{n} } \\ {\varphi ({\mathbf{u}}_{n}^{\text{T}} {\boldsymbol{\hat{\alpha }}}_{n} )} \\ \end{array} } \right]. $$
(59)

Then, the desired signal \( d_{n} \) can be written as

$$ d_{n} = {\mathbf{h}}_{n}^{\text{T}} {\varvec{\uptheta}}_{\text{opt}} + v_{n} $$
(60)

where

$$ {\varvec{\uptheta}}_{\text{opt}} = \left[ {\begin{array}{*{20}c} {{\mathbf{w}}_{o} } \\ {\sigma_{i} } \\ \end{array} } \right] $$
(61)
$$ {\mathbf{h}}_{n} = \left[ {\begin{array}{*{20}c} {\varPhi_{n} ({\mathbf{u}}_{n}^{\text{T}} {\boldsymbol{\hat{\alpha }}})u_{n} } \\ {\varphi ({\mathbf{u}}_{n}^{\text{T}} {\boldsymbol{\hat{\alpha }}})} \\ \end{array} } \right]. $$
(62)

Using (60), (16) implies

$$ \varepsilon_{n} ({\mathbf{w}},\sigma_{i,n - 1} ,{\boldsymbol{\hat{\alpha }}}_{n - 1} ) \approx \varepsilon_{n} ({\mathbf{w}},\sigma_{i,n - 1} ,{\varvec{\upalpha}}) = d_{n} - {\mathbf{h}}_{n}^{\text{T}} {\boldsymbol{\hat{\theta }}}_{n - 1} = {\mathbf{h}}_{n}^{\text{T}} {\boldsymbol{\tilde{\theta }}}_{n - 1} + v_{n} $$
(63)

where

$$ {\boldsymbol{\tilde{\theta }}}_{n} = {\varvec{\uptheta}}_{\text{opt}} - {\boldsymbol{\hat{\theta }}}_{n} . $$
(64)

Inserting (64) into (57) yields

$$ \begin{aligned} {\boldsymbol{\tilde{\theta }}}_{n} & = ({\mathbf{I}} - \mu f(\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n - 1} ,{\boldsymbol{\hat{\alpha }}})){\boldsymbol{\rm H}}_{n} ){\boldsymbol{\tilde{\theta }}}_{n - 1} \\ & \quad - \mu f(\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n - 1} ,{\boldsymbol{\hat{\alpha }}})v_{n} {\boldsymbol{\bar{G}}}_{n - 1} {\mathbf{h}}_{n} + \left[ {\begin{array}{*{20}c} 0 \\ {\kappa f_{\beta } ({\mathbf{w}}_{n - 1} )} \\ \end{array} } \right] \\ \end{aligned} $$
(65)

where

$$ {\boldsymbol{\rm H}}_{n} = {\boldsymbol{\bar{G}}}_{n - 1} {\mathbf{h}}_{n} {\mathbf{h}}_{n}^{\text{T}} . $$
(66)

Then, taking the expectation of both sides of (65)

$$ \begin{aligned} E[{\boldsymbol{\tilde{\theta }}}_{n} ] & = ({\mathbf{I}} - \mu E[f(\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n - 1} ,{\boldsymbol{\hat{\alpha }}})){\boldsymbol{\rm H}}_{n} ])E[{\boldsymbol{\tilde{\theta }}}_{n - 1} ] \\ & \quad - \mu f(\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n - 1} ,{\boldsymbol{\hat{\alpha }}})E[v_{n} {\mathbf{]}}E{\boldsymbol{[\bar{G}}}_{n - 1} {\mathbf{]}}E{\mathbf{[h}}_{n} ] + \left[ {\begin{array}{*{20}c} 0 \\ {E[\kappa f_{\beta } ({\mathbf{w}}_{n - 1} )]} \\ \end{array} } \right] \\ & = ({\mathbf{I}} - \mu {\boldsymbol{\Re }}_{n} )E[{\boldsymbol{\tilde{\theta }}}_{n - 1} ] + \left[ {\begin{array}{*{20}c} 0 \\ {E[\kappa f_{\beta } ({\mathbf{w}}_{n - 1} )]} \\ \end{array} } \right] \\ & \approx ({\mathbf{I}} - \mu {\boldsymbol{\Re }}_{n} )E[{\boldsymbol{\tilde{\theta }}}_{n - 1} ] \\ \end{aligned} $$
(67)

where

$$ {\boldsymbol{\Re }}_{n} = E[f(\varepsilon_{n} ({\mathbf{w}},\sigma_{i,n - 1} ,{\boldsymbol{\hat{\alpha }}})){\boldsymbol{\rm H}}_{n} ]. $$
(68)

In order to ensure the convergence of the algorithm, this should make \( - 1 < ({\mathbf{I}} - \mu {\boldsymbol{\Re }}_{n} ) < 1 \). Therefore, the range of \( \mu \) is

$$ 0 < \mu < \frac{2}{{\lambda_{\hbox{max} } ({\boldsymbol{\Re }}_{n} )}}. $$
(69)

Combining (53) and (69), we can obtain

$$ 0 < \mu < \hbox{min} \left( { - \frac{2}{{\lambda_{\hbox{min} } (E[{\boldsymbol{\rm Z}}_{n} ])}},\frac{2}{{\lambda_{\hbox{max} } ({\boldsymbol{\Re }}_{n} )}}} \right). $$
(70)

4 Simulation

4.1 Verify the Superiority of CR-FRMS

Firstly, the simulations in context of the system identification are carried out to illustrate the advantage of the proposed algorithm. The length of the unknown system generated randomly is 8 taps. It is assumed that the length of the unknown system is same as that of adaptive filter. Figure 1a, b depicts the performance of the CR-FRMS, FRMS [24], TSA [11], and LMS [13] algorithms under two background noise distributions, respectively, where \( p = 2 \) and \( \kappa = 8 \times 10^{ - 6} \). The step sizes of four different algorithms are set to 0.03 in two different mixed noise environments. Figure 1a shows the simulation results in a mixture of sub-Gaussian and super-Gaussian noises, and Fig. 1b shows the simulation results in a mixture of sub-Gaussian noise and impulsive noise [uniform and Laplace distribution are independent and identically distributed (i.i.d.) over time with zero mean]. Bernoulli–Gaussian (BG) process [41] is used frequently for modeling the impulsive noise \( \xi_{n} \), formulated as \( \xi_{n} = \tau_{n} \upsilon_{n} \), where \( \tau_{n} \) is a Bernoulli process with the probability of \( p\{ \tau_{n} = 1\} = 0.01 \) and \( \upsilon_{n} \) is i.i.d. zero-mean Gaussian sequence with variance \( \sigma = 10000 \). For a fair comparison of proposed algorithms, parameters are set so that the algorithms have a comparable initial convergence speed. As observed in Fig. 1a, b, the proposed CR-FRMS algorithm is superior to other existing algorithms. In addition, the calculation formula of MSD is defined as

$$ {\text{MSD}} = 10\log_{10} \left\| {{\mathbf{w}}_{o} - {\mathbf{w}}_{n} } \right\|^{2} $$
(71)
Fig. 1
figure 1

MSD curves for the proposed algorithm a the mixed uniform and Laplace distributed noises, where \( \varsigma = 0.5 \). b The mixed uniform distributed and impulsive noises, where \( \varsigma = 0.005 \)

The second experiment is to test the convergence performance of CR-FRMS with different step sizes. Figure 2a, b illustrates the MSD learning curves with different step sizes of input signals for Gaussian white noise. As expected, when the fixed step size is applied in the CR-FRMS algorithm, there is a trade-off between the steady-state MSD and the rate of convergence. That is, a small step size corresponds to the lower steady-state error, although it slows down the convergence speed. In contrast, a large step size which is located in the stable range provides the higher the convergence speed, while it achieves large steady-state error.

Fig. 2
figure 2

Comparison of different step sizes; a the mixed noise environment of sub-Gaussian and super-Gaussian. b The mixed noise environment of sub-Gaussian and impulsive noise

The third experiment is to test the convergence performance of CR-FRMS with different parameters \( p \). Suppose that the unknown system has 8 coefficients. The driven signal is white Gaussian with unit variance. The filter length is 8. The step size of these algorithms is fixed to \( 3 \times 10^{ - 2} \), while \( p \) is set to different values. After a hundred times run, it can be seen from Fig. 3a that in the case of uniform distribution and Laplace distribution variances of 1 and 9, respectively, the larger the \( p \), the smaller the MSD, but the effect is not significant. However, as can be seen from Fig. 3b, in the case of mixed background noise with impulsive noise and sub-Gaussian noise (uniform distribution) with a signal-to-noise ratio (SNR) of 30, \( p = 2 \) obtains a better MSD when taking 1 or 3 with respect to \( p \).

Fig. 3
figure 3

Background noise is different \( p \)a the mixed noise environment of sub-Gaussian and super-Gaussian. b The mixed noise environment of sub-Gaussian and impulsive noise

4.2 Verify the Superiority of \( l_{0} \)-CRPFRMS

The proposed \( l_{0} \)-CRPFRMS is compared with the algorithms CR-FRMS, PNLMS [9] and \( l_{0} \)-LMS [5] in two different background noise environments. The first one is a mixed noise with a uniform distribution with variance \( \sigma_{\text{u}} = 2.5 \times 10^{ - 5} \) and a Laplace distribution with variance \( \sigma_{\text{L}} = 0.01 \). The second is still mixed noise with a sub-Gaussian noise (uniform distribution) with signal-to-noise ratio of 30 dB and the impulsive noise. Besides, in two different mixed background noise environments, \( p = 2 \). Suppose that the unknown system has 64 coefficients, in which two of them are nonzero ones (their locations and values are randomly selected). The input signal is white Gaussian with unit variance. The filter length is 64. After fifty independent operations, their MSD curves are shown in Fig. 4a, b. It is evidently recognized that \( l_{0} \)-CRPFRMS algorithm converges faster than its ancestor.

Fig. 4
figure 4

MSD curves for the proposed algorithm a the mixed sub-Gaussian and super-Gaussian noises, where \( \varsigma = 1 \times 10^{ - 6} \), \( \mu = 0.001 \), \( \kappa = 2 \times 10^{ - 6} \). b The mixed sub-Gaussian and impulsive noises, where \( \varsigma = 0.001 \), \( \mu = 0.005 \), \( \kappa = 1 \times 10^{ - 6} \)

Comparing different algorithms is also needed to do experimental research on different values of parameter \( \kappa \). Theoretically, when \( \kappa \) is larger, the update weight will be closer to zero faster, that is, the convergence speed of the algorithm will be accelerated. However, the MSD of the algorithm will be increased. In this paper, the range of values of \( \kappa \) is relatively small. Therefore, as can be seen from Fig. 5a, b, the convergence speed of the algorithm is almost the same when \( \kappa \) is different, but MSD has obvious difference, that is, the smaller the \( \kappa \), the smaller the MSD.

Fig. 5
figure 5

Comparison of different \( \kappa \) in a the mixed sub-Gaussian and super-Gaussian noises, b mixed sub-Gaussian and impulsive noises

5 Conclusion

In this paper, two algorithms based on censored regression, namely CR-FRMS and \( l_{0} \)-CRPFRMS, are proposed under two different mixed background noises. CR-FRMS show superiority to LMS, TSA and FRMS in terms of MSD. Meanwhile, in the case where the unknown system is a sparse system, \( l_{0} \)-CRPFRMS exhibits a faster convergence speed than CR-FRMS without changing the MSD. Since the two algorithms show different advantages, CR-FRMS has lower computational complexity and \( l_{0} \)-CRPFRMS converges fast. Therefore, in real life, we must combine the actual requirements and choose a reasonable algorithm.