Abstract
The growing need for dealing with big data has made it necessary to find computationally efficient methods for identifying important factors to be considered in statistical modeling. In the linear model, the Lasso is an effective way of selecting variables using penalized regression. It has spawned substantial research in the area of variable selection for models that depend on a linear combination of predictors. However, work addressing the lack of optimality of variable selection when the model errors are not Gaussian and/or when the data contain gross outliers is scarce. We propose the weighted signed-rank Lasso as a robust and efficient alternative to least absolute deviations and least squares Lasso. The approach is appealing for use with big data since one can use data augmentation to perform the estimation as a single weighted L 1 optimization problem. Selection and estimation consistency are theoretically established and evaluated via simulation studies. The results confirm the optimality of the rank-based approach for data with heavy-tailed and contaminated errors or data containing high-leverage points.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abebe, A., McKean, J. W., & Bindele, H. F. (2012). On the consistency of a class of nonlinear regression estimators. Pakistan Journal of Statistics and Operation Research, 8(3), 543–555.
Arslan, O. (2012). Weighted LAD-LASSO method for robust parameter estimation and variable selection in regression. Computational Statistics & Data Analysis, 56(6), 1952–1965.
Bindele, H. F., & Abebe, A. (2012). Bounded influence nonlinear signed-rank regression. Canadian Journal of Statistics, 40(1), 172–189.
Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348–1360.
Hettmansperger, T. P., & McKean, J. W. (2011). Robust nonparametric statistical methods. In Monographs on statistics and applied probability (Vol. 119, 2nd ed.). Boca Raton, FL: CRC Press.
Hössjer, O. (1994). Rank-based estimates in the linear model with high breakdown point. Journal of the American Statistical Association, 89(425), 149–158.
Johnson, B. A. (2009). Rank-based estimation in the ℓ 1-regularized partly linear model for censored outcomes with application to integrated analyses of clinical predictors and gene expression data. Biostatistics, 10(4), 659–666.
Johnson, B. A., Lin, D., & Zeng, D. (2008). Penalized estimating functions and variable selection in semiparametric regression models. Journal of the American Statistical Association, 103(482), 672–680.
Johnson, B. A., & Peng, L. (2008). Rank-based variable selection. Journal of Nonparametric Statistics, 20(3), 241–252.
Leng, C. (2010). Variable selection and coefficient estimation via regularized rank regression. Statistica Sinica, 20(1), 167.
Rousseeuw, P. J. (1984). Least median of squares regression. Journal of the American Statistical Association, 79(388), 871–880.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58, 267–288.
Wang, H., & Leng, C. (2008). A note on adaptive group lasso. Computational Statistics & Data Analysis, 52(12), 5277–5286.
Wang, H., Li, G., & Jiang, G. (2007). Robust regression shrinkage and consistent variable selection through the lad-lasso. Journal of Business & Economic Statistics, 25(3), 347–355.
Wang, L., & Li, R. (2009). Weighted Wilcoxon-type smoothly clipped absolute deviation method. Biometrics, 65(2), 564–571.
Wu, C. F. (1981). Asymptotic theory of nonlinear least squares estimation. Annals of Statistics, 9(3), 501–513.
Xu, J., Leng, C., & Ying, Z. (2010). Rank-based variable selection with censored data. Statistics and Computing, 20(2), 165–176.
Zou, H. (206). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476), 1418–1429.
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320.
Acknowledgements
We dedicate this work to Joseph W. McKean on the occasion of his 70th birthday. We are thankful for his mentorship and guidance over the years. We also thank the anonymous referee for suggestions that improved the presentation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
This Appendix provides some lemmas and the proofs of the main results (Theorems 2.1 and 2.2). In the proofs we have taken W = I to simplify notation. The general case follows by taking \(W^{1/2}\boldsymbol{x}\) in place of \(\boldsymbol{x}\) in the proofs.
2.1.1 Proofs
The following three lemmas, whose proofs follow from slight modifications of those given in Hössjer (1994) and Hettmansperger and McKean (2011), are key to deriving the proof of the main results.
Lemma 2.1.
Under assumptions (I 1 ) and (I 2 ), we have \(\tilde{\boldsymbol{\beta }}_{n} \rightarrow \boldsymbol{\beta }_{0}\;a.s.\)
The proof of this lemma is given in Hössjer (1994) for w ≡ 1 and in Abebe et al. (2012) for any positive w, and a more general regression model. Also, as in Wu (1981), the proof of this lemma is obtained by showing that
where B is an open subset of \(\mathcal{B}\) and \(\boldsymbol{\beta }_{0} \in Int(B)\).
Lemma 2.2.
Putting \(U_{n}(\boldsymbol{\gamma },\boldsymbol{\beta }) = \frac{\|S_{n}(\boldsymbol{\gamma }) - S_{n}(\boldsymbol{\beta }) -\boldsymbol{\xi }(\boldsymbol{\gamma }) + \boldsymbol{\xi }(\boldsymbol{\beta })\|_{1}} {n^{-1/2} +\| \boldsymbol{\xi }(\boldsymbol{\gamma })\|_{1}}\) , we have for small enough δ > 0 that
This lemma ensures that \(n^{-1/2}S_{n}(\boldsymbol{\beta }_{0})\) converges in distribution to a multivariate normal distribution with mean zero and covariance matrix \(\gamma _{\varphi ^{+}}\varSigma\). It also results in the following asymptotic linearity established in Hettmansperger and McKean (2011).
Lemma 2.3.
Under the assumption of the errors having a finite Fisher information, we have for all ε > 0 and C > 0
From this asymptotic linearity follows that for all \(\boldsymbol{\beta }\) such that \(\|\boldsymbol{\beta }-\boldsymbol{\beta }_{0}\|_{1} \leq C/\sqrt{n}\), we have
Proof of Theorem 2.1.
Set \(B =\{\boldsymbol{\beta } _{0} + n^{-1/2}\mathbf{u}:\;\;\| \mathbf{u}\|_{1} < C\}\). Clearly B is an open neighborhood of \(\boldsymbol{\beta }_{0}\) and therefore B c is a closed subset of \(\mathcal{B}\) not containing \(\boldsymbol{\beta }_{0}\). To complete the proof, it is then sufficient to show that
which from Lemma 1 of Wu (1981) will result in the \(\sqrt{n}\)-consistency of \(\hat{\boldsymbol{\beta }}_{n}\). Indeed,
Now by the mean value theorem, assuming without loss of generality the | β 0j  |  <  | β j  | , there exits α j  ∈ ( | β 0j  | , | β j  | ) such that
and therefore
This together with Eq. (2.13) imply that
as \(\boldsymbol{\beta }\in B^{c}\) implies that \(\boldsymbol{\beta }\) can be written as \(\boldsymbol{\beta }=\boldsymbol{\beta } _{0} + n^{-1/2}\mathbf{u}\) with \(\|\mathbf{u}\|_{1} \geq C\). Being a closed subset of a compact space, B c is compact, and hence, is closed and bounded. Then, there exists a constant M such that \(C \leq \|\mathbf{u}\|_{1} \leq M\). From the last term of equation (2.14), note that \(\sum _{j=1}^{p_{0} }\vert u_{j}\vert \leq \|\mathbf{u}\|_{1} \leq M\) from which, we have \(-\sqrt{n}a_{n}\sum _{j=1}^{p_{0} }\vert u_{j}\vert \geq -\sqrt{n}a_{n}M\). Thus,
and so,
By assumption (I 3), \(\lim _{n\rightarrow \infty }\Big[\sqrt{n}a_{n}M\Big] = 0\), and by Lemma 2.1, we have
Proof of Theorem 2.2.
From the proof of Theorem 2.1 to obtain the oracle property, it is sufficient to show that for any \(\boldsymbol{\beta }^{{\ast}}\) satisfying \(\|\boldsymbol{\beta }_{a}^{{\ast}}-\boldsymbol{\beta }_{0a}\|_{1} = O_{p}(n^{-1/2})\) and \(\vert \beta _{j}^{{\ast}}\vert < Cn^{-1/2}\) for \(j = p_{0} + 1,\ldots,d\), \(\frac{\partial Q(\boldsymbol{\beta })} {\partial \beta _{j}} \Big\vert _{\boldsymbol{\beta }=\boldsymbol{\beta }^{{\ast}}}\) and β j ∗ have the same sign. Indeed,
where \(S_{n}^{j}(\boldsymbol{\beta }_{0})\) is the j th component of \(S_{n}(\boldsymbol{\beta }_{0})\). Note that by assumption (I 3), \(\sqrt{n}H_{ \lambda _{j}}(\vert \beta _{j}^{{\ast}}\vert ) \geq \sqrt{n}b_{ n} \rightarrow \infty \) as n → ∞, and thus the sign of \(\frac{\partial Q(\boldsymbol{\beta })} {\partial \beta _{j}} \Big\vert _{\boldsymbol{\beta }=\boldsymbol{\beta }^{{\ast}}}\) is fully determined by that of β j ∗ for n large enough. This together with Theorem 2.1 implies that \(\lim _{n\rightarrow \infty }P(\hat{\boldsymbol{\beta }}_{nb} = \mathbf{0}) = 1\).
Moreover, by definition of \(\hat{\boldsymbol{\beta }}_{n}\), it is obtained in a straightforward manner that \(\frac{\partial Q(\boldsymbol{\beta })} {\partial \boldsymbol{\beta }_{a}} \Big\vert _{\boldsymbol{\beta }=(\hat{\boldsymbol{\beta }}_{a},0)} = o_{P}(1)\). From this, partitioning \(S_{n}(\boldsymbol{\beta }_{0})\) as \((S_{n,a}(\boldsymbol{\beta }_{0}),S_{n,b}(\boldsymbol{\beta }_{0}))\), it follows from Eq. (2.12) that
and \(\vert \sqrt{n}\sum _{j=1}^{p_{0}}H_{\lambda _{ j}}(\vert \hat{\beta }_{na,j}\vert )\mbox{ sgn}(\hat{\beta }_{na,j})\vert \leq p_{0}\sqrt{n}a_{n} \rightarrow 0\) as n → ∞ by assumption (I 3). Hence,
As \(n^{-1/2}S_{n,a}(\boldsymbol{\beta }_{0})\mathop{\longrightarrow}\limits_{}^{\mathcal{D}}N\big(0,\ \gamma _{\varphi ^{+}}\varSigma _{a}\big)\), we have
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Abebe, A., Bindele, H.F. (2016). Robust Signed-Rank Variable Selection in Linear Regression. In: Liu, R., McKean, J. (eds) Robust Rank-Based and Nonparametric Methods. Springer Proceedings in Mathematics & Statistics, vol 168. Springer, Cham. https://doi.org/10.1007/978-3-319-39065-9_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-39065-9_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-39063-5
Online ISBN: 978-3-319-39065-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)