Skip to main content

Robust Signed-Rank Variable Selection in Linear Regression

  • Conference paper
  • First Online:
Robust Rank-Based and Nonparametric Methods

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 168))

Abstract

The growing need for dealing with big data has made it necessary to find computationally efficient methods for identifying important factors to be considered in statistical modeling. In the linear model, the Lasso is an effective way of selecting variables using penalized regression. It has spawned substantial research in the area of variable selection for models that depend on a linear combination of predictors. However, work addressing the lack of optimality of variable selection when the model errors are not Gaussian and/or when the data contain gross outliers is scarce. We propose the weighted signed-rank Lasso as a robust and efficient alternative to least absolute deviations and least squares Lasso. The approach is appealing for use with big data since one can use data augmentation to perform the estimation as a single weighted L 1 optimization problem. Selection and estimation consistency are theoretically established and evaluated via simulation studies. The results confirm the optimality of the rank-based approach for data with heavy-tailed and contaminated errors or data containing high-leverage points.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Abebe, A., McKean, J. W., & Bindele, H. F. (2012). On the consistency of a class of nonlinear regression estimators. Pakistan Journal of Statistics and Operation Research, 8(3), 543–555.

    Article  MathSciNet  Google Scholar 

  • Arslan, O. (2012). Weighted LAD-LASSO method for robust parameter estimation and variable selection in regression. Computational Statistics & Data Analysis, 56(6), 1952–1965.

    Article  MathSciNet  MATH  Google Scholar 

  • Bindele, H. F., & Abebe, A. (2012). Bounded influence nonlinear signed-rank regression. Canadian Journal of Statistics, 40(1), 172–189.

    Article  MathSciNet  MATH  Google Scholar 

  • Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348–1360.

    Article  MathSciNet  MATH  Google Scholar 

  • Hettmansperger, T. P., & McKean, J. W. (2011). Robust nonparametric statistical methods. In Monographs on statistics and applied probability (Vol. 119, 2nd ed.). Boca Raton, FL: CRC Press.

    Google Scholar 

  • Hössjer, O. (1994). Rank-based estimates in the linear model with high breakdown point. Journal of the American Statistical Association, 89(425), 149–158.

    MathSciNet  MATH  Google Scholar 

  • Johnson, B. A. (2009). Rank-based estimation in the â„“ 1-regularized partly linear model for censored outcomes with application to integrated analyses of clinical predictors and gene expression data. Biostatistics, 10(4), 659–666.

    Article  Google Scholar 

  • Johnson, B. A., Lin, D., & Zeng, D. (2008). Penalized estimating functions and variable selection in semiparametric regression models. Journal of the American Statistical Association, 103(482), 672–680.

    Article  MathSciNet  MATH  Google Scholar 

  • Johnson, B. A., & Peng, L. (2008). Rank-based variable selection. Journal of Nonparametric Statistics, 20(3), 241–252.

    Article  MathSciNet  MATH  Google Scholar 

  • Leng, C. (2010). Variable selection and coefficient estimation via regularized rank regression. Statistica Sinica, 20(1), 167.

    MathSciNet  MATH  Google Scholar 

  • Rousseeuw, P. J. (1984). Least median of squares regression. Journal of the American Statistical Association, 79(388), 871–880.

    Article  MathSciNet  MATH  Google Scholar 

  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58, 267–288.

    MathSciNet  MATH  Google Scholar 

  • Wang, H., & Leng, C. (2008). A note on adaptive group lasso. Computational Statistics & Data Analysis, 52(12), 5277–5286.

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, H., Li, G., & Jiang, G. (2007). Robust regression shrinkage and consistent variable selection through the lad-lasso. Journal of Business & Economic Statistics, 25(3), 347–355.

    Article  MathSciNet  Google Scholar 

  • Wang, L., & Li, R. (2009). Weighted Wilcoxon-type smoothly clipped absolute deviation method. Biometrics, 65(2), 564–571.

    Article  MathSciNet  MATH  Google Scholar 

  • Wu, C. F. (1981). Asymptotic theory of nonlinear least squares estimation. Annals of Statistics, 9(3), 501–513.

    Article  MathSciNet  MATH  Google Scholar 

  • Xu, J., Leng, C., & Ying, Z. (2010). Rank-based variable selection with censored data. Statistics and Computing, 20(2), 165–176.

    Article  MathSciNet  Google Scholar 

  • Zou, H. (206). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476), 1418–1429.

    Google Scholar 

  • Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We dedicate this work to Joseph W. McKean on the occasion of his 70th birthday. We are thankful for his mentorship and guidance over the years. We also thank the anonymous referee for suggestions that improved the presentation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Asheber Abebe .

Editor information

Editors and Affiliations

Appendix

Appendix

This Appendix provides some lemmas and the proofs of the main results (Theorems 2.1 and 2.2). In the proofs we have taken W = I to simplify notation. The general case follows by taking \(W^{1/2}\boldsymbol{x}\) in place of \(\boldsymbol{x}\) in the proofs.

2.1.1 Proofs

The following three lemmas, whose proofs follow from slight modifications of those given in Hössjer (1994) and Hettmansperger and McKean (2011), are key to deriving the proof of the main results.

Lemma 2.1.

Under assumptions (I 1 ) and (I 2 ), we have \(\tilde{\boldsymbol{\beta }}_{n} \rightarrow \boldsymbol{\beta }_{0}\;a.s.\)

The proof of this lemma is given in Hössjer (1994) for w ≡ 1 and in Abebe et al. (2012) for any positive w, and a more general regression model. Also, as in Wu (1981), the proof of this lemma is obtained by showing that

$$\displaystyle{ \lim _{n\rightarrow \infty }\inf _{\boldsymbol{\beta }\in B^{c}}\big(D_{n}(\mathbf{v}_{n},w,\boldsymbol{\beta }) - D_{n}(\mathbf{v}_{n},w,\boldsymbol{\beta }_{0})\big) > 0\;\;a.s. }$$
(2.11)

where B is an open subset of \(\mathcal{B}\) and \(\boldsymbol{\beta }_{0} \in Int(B)\).

Lemma 2.2.

Putting \(U_{n}(\boldsymbol{\gamma },\boldsymbol{\beta }) = \frac{\|S_{n}(\boldsymbol{\gamma }) - S_{n}(\boldsymbol{\beta }) -\boldsymbol{\xi }(\boldsymbol{\gamma }) + \boldsymbol{\xi }(\boldsymbol{\beta })\|_{1}} {n^{-1/2} +\| \boldsymbol{\xi }(\boldsymbol{\gamma })\|_{1}}\) , we have for small enough δ > 0 that

$$\displaystyle{\sup _{\|\boldsymbol{\gamma }\|\leq \delta }U_{n}(\boldsymbol{\gamma },\boldsymbol{\beta }_{0})\mathop{\longrightarrow}\limits_{}^{a.s}0\quad \mbox{ as}\quad n \rightarrow \infty.}$$

This lemma ensures that \(n^{-1/2}S_{n}(\boldsymbol{\beta }_{0})\) converges in distribution to a multivariate normal distribution with mean zero and covariance matrix \(\gamma _{\varphi ^{+}}\varSigma\). It also results in the following asymptotic linearity established in Hettmansperger and McKean (2011).

Lemma 2.3.

Under the assumption of the errors having a finite Fisher information, we have for all ε > 0 and C > 0

$$\displaystyle{P\left [\sup _{\sqrt{n}\|\boldsymbol{\beta }-\boldsymbol{\beta }_{0}\|_{1}\leq C}\|n^{-1/2}(S_{ n}(\boldsymbol{\beta })-S_{n}(\boldsymbol{\beta }_{0}))+\zeta _{\varphi ^{+}}\sqrt{n}(\boldsymbol{\beta }-\boldsymbol{\beta }_{0})\|_{1} \geq \epsilon \right ] \rightarrow 0\quad \mbox{ as $n \rightarrow \infty $}.}$$

From this asymptotic linearity follows that for all \(\boldsymbol{\beta }\) such that \(\|\boldsymbol{\beta }-\boldsymbol{\beta }_{0}\|_{1} \leq C/\sqrt{n}\), we have

$$\displaystyle{ n^{-1/2}S_{ n}(\boldsymbol{\beta }) = n^{-1/2}S_{ n}(\boldsymbol{\beta }_{0}) -\zeta _{\varphi ^{+}}\sqrt{n}(\boldsymbol{\beta }-\boldsymbol{\beta }_{0}) + o(1) }$$
(2.12)

Proof of Theorem 2.1.

Set \(B =\{\boldsymbol{\beta } _{0} + n^{-1/2}\mathbf{u}:\;\;\| \mathbf{u}\|_{1} < C\}\). Clearly B is an open neighborhood of \(\boldsymbol{\beta }_{0}\) and therefore B c is a closed subset of \(\mathcal{B}\) not containing \(\boldsymbol{\beta }_{0}\). To complete the proof, it is then sufficient to show that

$$\displaystyle{\lim _{n\rightarrow \infty }\inf _{\boldsymbol{\beta }\in B^{c}}\big(Q(\boldsymbol{\beta }) - Q(\boldsymbol{\beta }_{0})\big) > 0\;\;a.s.}$$

which from Lemma 1 of Wu (1981) will result in the \(\sqrt{n}\)-consistency of \(\hat{\boldsymbol{\beta }}_{n}\). Indeed,

$$\displaystyle{ Q(\boldsymbol{\beta }) - Q(\boldsymbol{\beta }_{0}) = D_{n}(\mathbf{v}_{n},w,\boldsymbol{\beta }) - D_{n}(\mathbf{v}_{n},w,\boldsymbol{\beta }_{0}) + n\sum _{j=1}^{d}\big[P_{\lambda _{ j}}(\vert \beta _{j}\vert ) - P_{\lambda _{j}}(\vert \beta _{0j}\vert )\big]. }$$
(2.13)

Now by the mean value theorem, assuming without loss of generality the | β 0j  |  <  | β j  | , there exits α j  ∈ ( | β 0j  | , | β j  | ) such that

$$\displaystyle{P_{\lambda _{j}}(\vert \beta _{j}\vert ) - P_{\lambda _{j}}(\vert \beta _{0j}\vert ) = H_{\lambda _{j}}(\vert \alpha _{j}\vert )sgn(\alpha _{j})(\vert \beta _{j}\vert -\vert \beta _{0j}\vert ),}$$

and therefore

$$\displaystyle{\vert P_{\lambda _{j}}(\vert \beta _{j}\vert ) - P_{\lambda _{j}}(\vert \beta _{0j}\vert )\vert \leq H_{\lambda _{j}}(\vert \alpha _{j}\vert )\vert \beta _{j} -\beta _{0j}\vert.}$$

This together with Eq. (2.13) imply that

$$\displaystyle\begin{array}{rcl} Q(\boldsymbol{\beta }) - Q(\boldsymbol{\beta }_{0})& =& D_{n}(\mathbf{v}_{n},w,\boldsymbol{\beta }) - D_{n}(\mathbf{v}_{n},w,\boldsymbol{\beta }_{0}) + n\sum _{j=1}^{d}H_{\lambda _{ j}}(\vert \alpha _{j}\vert )sgn(\alpha _{j})(\vert \beta _{j}\vert -\vert \beta _{0j}\vert ) \\ & \geq & D_{n}(\mathbf{v}_{n},w,\boldsymbol{\beta }) - D_{n}(\mathbf{v}_{n},w,\boldsymbol{\beta }_{0}) -\sqrt{n}a_{n}\sum _{j=1}^{p_{0} }\vert u_{j}\vert, {}\end{array}$$
(2.14)

as \(\boldsymbol{\beta }\in B^{c}\) implies that \(\boldsymbol{\beta }\) can be written as \(\boldsymbol{\beta }=\boldsymbol{\beta } _{0} + n^{-1/2}\mathbf{u}\) with \(\|\mathbf{u}\|_{1} \geq C\). Being a closed subset of a compact space, B c is compact, and hence, is closed and bounded. Then, there exists a constant M such that \(C \leq \|\mathbf{u}\|_{1} \leq M\). From the last term of equation (2.14), note that \(\sum _{j=1}^{p_{0} }\vert u_{j}\vert \leq \|\mathbf{u}\|_{1} \leq M\) from which, we have \(-\sqrt{n}a_{n}\sum _{j=1}^{p_{0} }\vert u_{j}\vert \geq -\sqrt{n}a_{n}M\). Thus,

$$\displaystyle{Q(\boldsymbol{\beta }) - Q(\boldsymbol{\beta }_{0}) \geq D_{n}(\mathbf{v}_{n},w,\boldsymbol{\beta }) - D_{n}(\mathbf{v}_{n},w,\boldsymbol{\beta }_{0}) -\sqrt{n}a_{n}M,}$$

and so,

$$\displaystyle{\lim _{n\rightarrow \infty }\inf _{\boldsymbol{\beta }\in B^{c}}\big(Q(\boldsymbol{\beta })-Q(\boldsymbol{\beta }_{0})\big) \geq \lim _{n\rightarrow \infty }\inf _{\boldsymbol{\beta }\in B^{c}}\big(D_{n}(\mathbf{v}_{n},w,\boldsymbol{\beta })-D_{n}(\mathbf{v}_{n},w,\boldsymbol{\beta }_{0})\big)-\lim _{n\rightarrow \infty }\Big[\sqrt{n}a_{n}M\Big].}$$

By assumption (I 3), \(\lim _{n\rightarrow \infty }\Big[\sqrt{n}a_{n}M\Big] = 0\), and by Lemma 2.1, we have

$$\displaystyle{\lim _{n\rightarrow \infty }\inf _{\boldsymbol{\beta }\in B^{c}}\big(Q(\boldsymbol{\beta }) - Q(\boldsymbol{\beta }_{0})\big) > 0\;\;a.s.}$$

Proof of Theorem 2.2.

From the proof of Theorem 2.1 to obtain the oracle property, it is sufficient to show that for any \(\boldsymbol{\beta }^{{\ast}}\) satisfying \(\|\boldsymbol{\beta }_{a}^{{\ast}}-\boldsymbol{\beta }_{0a}\|_{1} = O_{p}(n^{-1/2})\) and \(\vert \beta _{j}^{{\ast}}\vert < Cn^{-1/2}\) for \(j = p_{0} + 1,\ldots,d\), \(\frac{\partial Q(\boldsymbol{\beta })} {\partial \beta _{j}} \Big\vert _{\boldsymbol{\beta }=\boldsymbol{\beta }^{{\ast}}}\) and β j ∗ have the same sign. Indeed,

$$\displaystyle\begin{array}{rcl} n^{-1/2}\frac{\partial Q(\boldsymbol{\beta })} {\partial \beta _{j}} \Big\vert _{\boldsymbol{\beta }=\boldsymbol{\beta }^{{\ast}}}& =& -n^{-1/2}S_{ n}^{j}(\boldsymbol{\beta }_{ 0}) +\zeta _{\varphi ^{+}}\sqrt{n}(\boldsymbol{\beta }^{{\ast}}-\boldsymbol{\beta }_{ 0}) + \sqrt{n}H_{\lambda _{j}}(\vert \beta _{j}^{{\ast}}\vert )\mbox{ sgn}(\beta _{ j}^{{\ast}}) + o(1) {}\\ & =& O_{P}(1) + \sqrt{n}H_{\lambda _{j}}(\vert \beta _{j}^{{\ast}}\vert )\mbox{ sgn}(\beta _{ j}^{{\ast}})\;\;\mbox{ for $j = p_{ 0} + 1,\ldots,d$}, {}\\ \end{array}$$

where \(S_{n}^{j}(\boldsymbol{\beta }_{0})\) is the j th component of \(S_{n}(\boldsymbol{\beta }_{0})\). Note that by assumption (I 3), \(\sqrt{n}H_{ \lambda _{j}}(\vert \beta _{j}^{{\ast}}\vert ) \geq \sqrt{n}b_{ n} \rightarrow \infty \) as n → ∞, and thus the sign of \(\frac{\partial Q(\boldsymbol{\beta })} {\partial \beta _{j}} \Big\vert _{\boldsymbol{\beta }=\boldsymbol{\beta }^{{\ast}}}\) is fully determined by that of β j ∗ for n large enough. This together with Theorem 2.1 implies that \(\lim _{n\rightarrow \infty }P(\hat{\boldsymbol{\beta }}_{nb} = \mathbf{0}) = 1\).

Moreover, by definition of \(\hat{\boldsymbol{\beta }}_{n}\), it is obtained in a straightforward manner that \(\frac{\partial Q(\boldsymbol{\beta })} {\partial \boldsymbol{\beta }_{a}} \Big\vert _{\boldsymbol{\beta }=(\hat{\boldsymbol{\beta }}_{a},0)} = o_{P}(1)\). From this, partitioning \(S_{n}(\boldsymbol{\beta }_{0})\) as \((S_{n,a}(\boldsymbol{\beta }_{0}),S_{n,b}(\boldsymbol{\beta }_{0}))\), it follows from Eq. (2.12) that

$$\displaystyle{o_{P}(1) = n^{-1/2}S_{ n,a}(\boldsymbol{\beta }_{0}) -\zeta _{\varphi ^{+}}\sqrt{n}(\hat{\boldsymbol{\beta }}_{na} -\boldsymbol{\beta }_{0a}) + \sqrt{n}\sum _{j=1}^{p_{0} }H_{\lambda _{j}}(\vert \hat{\beta }_{na,j}\vert )\mbox{ sgn}(\hat{\beta }_{na,j}),}$$

and \(\vert \sqrt{n}\sum _{j=1}^{p_{0}}H_{\lambda _{ j}}(\vert \hat{\beta }_{na,j}\vert )\mbox{ sgn}(\hat{\beta }_{na,j})\vert \leq p_{0}\sqrt{n}a_{n} \rightarrow 0\) as n → ∞ by assumption (I 3). Hence,

$$\displaystyle{\sqrt{n}(\hat{\boldsymbol{\beta }}_{na} -\boldsymbol{\beta }_{0a}) =\zeta _{ \varphi ^{+}}^{-1}n^{-1/2}S_{ n,a}(\boldsymbol{\beta }_{0}) + o_{P}(1).}$$

As \(n^{-1/2}S_{n,a}(\boldsymbol{\beta }_{0})\mathop{\longrightarrow}\limits_{}^{\mathcal{D}}N\big(0,\ \gamma _{\varphi ^{+}}\varSigma _{a}\big)\), we have

$$\displaystyle{\sqrt{n}\big(\hat{\boldsymbol{\beta }}_{na} -\boldsymbol{\beta }_{0a}\big)\mathop{\longrightarrow}\limits_{}^{\mathcal{D}}N\big(0,\ \zeta _{\varphi ^{+}}^{-2}\gamma _{ \varphi ^{+}}\varSigma _{a}\big).}$$

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Abebe, A., Bindele, H.F. (2016). Robust Signed-Rank Variable Selection in Linear Regression. In: Liu, R., McKean, J. (eds) Robust Rank-Based and Nonparametric Methods. Springer Proceedings in Mathematics & Statistics, vol 168. Springer, Cham. https://doi.org/10.1007/978-3-319-39065-9_2

Download citation

Publish with us

Policies and ethics