Advertisement

Consistency bounds and support recovery of d-stationary solutions of sparse sample average approximations

  • Miju AhnEmail author
Article
  • 23 Downloads

Abstract

This paper studies properties of the d(irectional)-stationary solutions of sparse sample average approximation problems involving difference-of-convex sparsity functions under a deterministic setting. Such properties are investigated with respect to a vector which satisfies a verifiable assumption to relate the empirical sample average approximation problem to the expectation minimization problem defined by an underlying data distribution. We derive bounds for the distance between the two vectors and the difference of the model outcomes generated by them. Furthermore, the inclusion relationships between their supports, sets of nonzero valued indices, are studied. We provide conditions under which the support of a d-stationary solution is contained within, and contains, the support of the vector of interest; the first kind of inclusion can be shown for any given arbitrary set of indices. Some of the results presented herein are generalization of the existing theory for a specialized problem of \(\ell _1\)-norm regularized least squares minimization for linear regression.

Keywords

Non-convex optimization Sparse learning Difference-of-convex program Directional stationary solution 

Notes

Acknowledgements

The author gratefully acknowledges Jong-Shi Pang for his involvement in fruitful discussions, and for providing valuable ideas that helped to build the foundation of this work.

References

  1. 1.
    Ahn, M.: Difference-of-Convex Learning: Optimization with Non-convex Sparsity Functions. University of Southern California, Los Angeles (2018)Google Scholar
  2. 2.
    Ahn, M., Pang, J.S., Xin, J.: Difference of convex learning: directional stationarity, optimality and sparsity. SIAM J. Optim. 27(3), 1637–1665 (2017)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmont (1999)zbMATHGoogle Scholar
  4. 4.
    Bühlmann, P., van de Geer, S.: Statistics for High-dimensional Data. Springer Series in Statistics. Springer, Berlin (2011)CrossRefGoogle Scholar
  5. 5.
    Bickel, B.J., Ritov, Y., Tsybakov, A.B.: Simultaneous analysis of LASSO and Dantzig selector. Ann. Stat> 37(4), 1705–1732 (2009)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Candès, E., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theory 51(12), 4203–4216 (2005)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Candès, E., Tao, T.: Near optimal signal recovery from random projections: universal encoding strategies. IEEE Trans. Inf. Theory 52(12), 5406–5425 (2006)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Candes, E., Tao, T.: The Dantzig selector: statistical estimation when p is much larger than n. Ann. Stat. 35(6), 2313–2351 (2007)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Candès, E., Wakin, M., Boyd, S.: Enhancing sparsity by reweighted \(\ell _1\) minimization. J. Fourier Anal. Appl. 14(5), 877–905 (2008)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Dong, H., Ahn, M., Pang, J.S.: Structural properties of affine sparsity constraints. Math. Program. Ser. B (2018).  https://doi.org/10.1007/s10107-018-1283-3 MathSciNetCrossRefGoogle Scholar
  11. 11.
    Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Fan, J., Lv, J.: Nonconcave penalized likelihood with NP-dimensionality. IEEE Trans. Inf. Theory 57(8), 5467–5484 (2011)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Hastie, T., Tibshirani, R., Wainwright, M.: Statistical Learning with Sparsity: The Lasso and Generalizations. CRC Press/Taylor & Francis Group, Boca Raton (2015)CrossRefGoogle Scholar
  14. 14.
    Knight, K., Fu, W.: Asymptotics for lasso-type estimators. Ann. Stat. 28(5), 1356–1378 (2000)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Le Thi, H.A., Pham, D.T.: The DC programming and DCA revised with DC models of real world nonconvex optimization problems. Ann. Oper. Res. 133, 25–46 (2005)zbMATHGoogle Scholar
  16. 16.
    Le Thi, H.A., Pham, D.T., Vo, X.T.: DC approximation approaches for sparse optimization. Eur. J. Oper. Res. 244, 26–46 (2015)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Loh, P., Wainwright, M.: Regularized M-estimators with nonconvexity: statistical and algorithmic theory for local optima. J. Mach. Learn. Res. 16, 559–616 (2015)MathSciNetzbMATHGoogle Scholar
  18. 18.
    Loh, P., Wainwright, M.: Support recovery without incoherence: a case for nonconvex regularization. Ann. Stat. 45(6), 2455–2482 (2017)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Lou, Y., Yin, P., Xin, J.: Point source super-resolution via nonconvex L1 based methods. J. Sci. Comput. 68(3), 1082–1100 (2016)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Lu, S., Liu, Y., Yin, L., Zhang, K.: Confidence intervals and regions for the lasso by using stochastic variational inequality techniques in optimization. J. Roy. Stat. Soc. B 79(2), 589–611 (2017)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Negahban, S., Ravikumar, P., Wainwright, M., Yu, B.: A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers. Stat. Sci. 27(4), 538–557 (2012)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Nikolova, M.: Local strong homogeneity of a regularized estimator. SIAM J. Appl. Math. 61(2), 633–658 (2000)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Pang, J.S., Razaviyayn, M., Alvarado, A.: Computing B-stationary points of nonsmooth DC programs. Mathe. Oper. Res. 42(1), 95–118 (2017)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Pang, J.S., Tao, M.: Decomposition methods for computing directional stationary solutions of a class of nonsmooth nonconvex optimization problems. SIAM J. Optim. 28(2), 1640–1669 (2018)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Shapiro, A., Dentcheva, D., Ruszczynski, A.: Lectures on Stochastic Programming: Modeling and Theory. SIAM Publications. Philadelphia (2009)Google Scholar
  26. 26.
    Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. 58(1), 267–288 (1996)MathSciNetzbMATHGoogle Scholar
  27. 27.
    Yin, P., Lou, Y., He, Q., Xin, J.: Minimization of L1–L2 for compressed sensing. SIAM J. Sci. Comput. 37(1), 536–563 (2015)MathSciNetCrossRefGoogle Scholar
  28. 28.
    Yin, W., Osher, S., Goldfarb, D., Darbon, J.: Bregman iterative algorithms for \(\text{ L }_1\)-minimization with applications to compressed Sensing. SIAM J. Imag. Sci. 1(1), 143–168 (2008)CrossRefGoogle Scholar
  29. 29.
    Zhang, C.: Nearly unbiased variable selection under Minimax Concave Penalty. Ann. Stat. 38(2), 894–942 (2010)MathSciNetCrossRefGoogle Scholar
  30. 30.
    Zhang, S., Xin, J.: Minimization of transformed \(\text{ L }_1\) penalty: theory, difference of convex function algorithm, and robust application in compressed sensing. Mathe. Programm. Ser. B 169(1), 307–336 (2018)MathSciNetCrossRefGoogle Scholar
  31. 31.
    Zhao, P., Yu, B.: On model selection consistency of LASSO. J. Mach. Learn. Res. 7, 2541–2563 (2006)MathSciNetzbMATHGoogle Scholar
  32. 32.
    Zou, H.: The adaptive Lasso and its oracle properties. J. Am. Stat. Assoc. 101(476), 1418–1429 (2006)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Engineering Management, Information, and SystemsSouthern Methodist UniversityDallasUSA

Personalised recommendations