Advertisement

Greedy Variance Estimation for the LASSO

  • Christopher Kennedy
  • Rachel WardEmail author
Article
  • 13 Downloads

Abstract

Recent results have proven the minimax optimality of LASSO and related algorithms for noisy linear regression. However, these results tend to rely on variance estimators that are inefficient or optimizations that are slower than LASSO itself. We propose an efficient estimator for the noise variance in high dimensional linear regression that is faster than LASSO, only requiring p matrix–vector multiplications. We prove this estimator is consistent with a good rate of convergence, under the condition that the design matrix satisfies the restricted isometry property (RIP). In practice, our estimator scales incredibly well into high dimensions, is highly parallelizable, and only incurs a modest bias.

Keywords

Estimate Noise Restricted isometry Sparsity Variance 

Notes

Acknowledgements

We thank Abhinav Nellore for discussions on parameter selection in high dimensional problems which motivated this work. We also thank Robert Tibshirani for directing us to the glmnet package for computing cv-LASSO. We also thank the anonymous referees for their feedback which greatly improved the manuscript. R. Ward and C. Kennedy were partially supported during this work by NSF CAREER Grant #1255631.

References

  1. 1.
    Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA 96(12), 6745–6750 (1999)CrossRefGoogle Scholar
  2. 2.
    Baraniuk, R., Davenport, M., DeVore, R., Wakin, M.: A simple proof of the restricted isometry property for random matrices. Constr. Approx. 28(3), 253–263 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Belloni, A., Chernozhukov, V., Wang, L.: Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika 98(4), 791–806 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Bickel, P.J., Ritov, Y., Tsybakov, A.B.: Simultaneous analysis of lasso and dantzig selector. Ann. Stat. 37, 1705–1732 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Candes, E.J., Davenport, M.A.: How well can we estimate a sparse vector? Appl. Comput. Harmon. Anal. 34(2), 317–323 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Candes, E.J., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theory 51(12), 4203–4215 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Chatterjee, S., Jafarov, J.: Prediction error of cross-validated lasso. arXiv:1502.06291 (2015)Google Scholar
  8. 8.
    Dicker, L.H.: Variance estimation in high-dimensional linear models. Biometrika 101(2), 269–284 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Fan, J., Guo, S., Hao, N.: Variance estimation using refitted cross-validation in ultrahigh dimensional regression. J. R. Stat. Soc. B 74(1), 37–65 (2012)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)CrossRefGoogle Scholar
  11. 11.
    Foucart, S., Rauhut, H.: A mathematical introduction to compressive sensing. Bull. Am. Math. 54, 151–165 (2017)zbMATHGoogle Scholar
  12. 12.
    Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)CrossRefGoogle Scholar
  13. 13.
    Homrighausen, D., McDonald, D.: The lasso, persistence, and cross-validation. In: Proceedings of the International Conference on Machine Learning, pp. 1031–1039 (2013)Google Scholar
  14. 14.
    Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. Ijcai 14, 1137–1145 (1995)Google Scholar
  15. 15.
    Laurent, B., Massart, P.: Adaptive estimation of a quadratic functional by model selection. Ann. Stat. 28, 1302–1338 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Lounici, K., Pontil, M., Van De Geer, S., Tsybakov, A.B., et al.: Oracle inequalities and optimal inference under group sparsity. Ann. Stat. 39(4), 2164–2204 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Meinshausen, N., Yu, B.: Lasso-type recovery of sparse representations for high-dimensional data. Ann. Stat. 37, 246–270 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Rauhut, H.: Compressive sensing and structured random matrices. Theor. Found. Num. Methods Sparse Recov. 9, 1–92 (2010)MathSciNetzbMATHGoogle Scholar
  19. 19.
    Raskutti, G., Wainwright, M.J., Yu, B.: Minimax rates of estimation for high-dimensional linear regression over $\ell _q $-balls. IEEE Trans. Inf. Theory 57(10), 6976–6994 (2011)CrossRefzbMATHGoogle Scholar
  20. 20.
    Reid, S., Tibshirani, R., Friedman, J.: A study of error variance estimation in lasso regression. Statistica Sinica 26, 35–67 (2016)MathSciNetzbMATHGoogle Scholar
  21. 21.
    Rudelson, M., Vershynin, R.: On sparse reconstruction from fourier and gaussian measurements. Commun. Pure Appl. Math. 61(8), 1025–1045 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P., et al.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2), 203–209 (2002)CrossRefGoogle Scholar
  23. 23.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1996)MathSciNetzbMATHGoogle Scholar
  24. 24.
    Van de Geer, S.A.: High-dimensional generalized linear models and the lasso. Ann. Stat. 36, 614–645 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Van De Geer, S.A., Bühlmann, P., et al.: On the conditions used to prove oracle results for the lasso. Electr. J. Stat. 3, 1360–1392 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Vershynin, R.: Introduction to the non-asymptotic analysis of random matrices. arXiv:1011.3027 (2010)Google Scholar
  27. 27.
    Verzelen, N., et al.: Minimax risks for sparse regressions: ultra-high dimensional phenomenons. Electr. J. Stat. 6, 38–90 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    Wainwright, M.J.: Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting. IEEE Trans. Inf. Theory 55(12), 5728–5741 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Ye, F., Zhang, C.-H.: Rate minimaxity of the lasso and dantzig selector for the lq loss in lr balls. J. Mach. Learn. Res. 11(Dec), 3519–3540 (2010)MathSciNetzbMATHGoogle Scholar
  30. 30.
    Zhang, T., et al.: Some sharp performance bounds for least squares regression with l1 regularization. Ann. Stat. 37(5A), 2109–2144 (2009)CrossRefzbMATHGoogle Scholar
  31. 31.
    Zhang, C.-H., Huang, J.: The sparsity and bias of the lasso selection in high-dimensional linear regression. Ann. Stat. 36, 1567–1594 (2008)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of MathematicsUniversity of Texas at AustinAustinUSA

Personalised recommendations