# Consistency bounds and support recovery of d-stationary solutions of sparse sample average approximations

- 23 Downloads

## Abstract

This paper studies properties of the d(irectional)-stationary solutions of sparse sample average approximation problems involving difference-of-convex sparsity functions under a deterministic setting. Such properties are investigated with respect to a vector which satisfies a verifiable assumption to relate the empirical sample average approximation problem to the expectation minimization problem defined by an underlying data distribution. We derive bounds for the distance between the two vectors and the difference of the model outcomes generated by them. Furthermore, the inclusion relationships between their supports, sets of nonzero valued indices, are studied. We provide conditions under which the support of a d-stationary solution is contained within, and contains, the support of the vector of interest; the first kind of inclusion can be shown for any given arbitrary set of indices. Some of the results presented herein are generalization of the existing theory for a specialized problem of \(\ell _1\)-norm regularized least squares minimization for linear regression.

## Keywords

Non-convex optimization Sparse learning Difference-of-convex program Directional stationary solution## Notes

### Acknowledgements

The author gratefully acknowledges Jong-Shi Pang for his involvement in fruitful discussions, and for providing valuable ideas that helped to build the foundation of this work.

## References

- 1.Ahn, M.: Difference-of-Convex Learning: Optimization with Non-convex Sparsity Functions. University of Southern California, Los Angeles (2018)Google Scholar
- 2.Ahn, M., Pang, J.S., Xin, J.: Difference of convex learning: directional stationarity, optimality and sparsity. SIAM J. Optim.
**27**(3), 1637–1665 (2017)MathSciNetCrossRefGoogle Scholar - 3.Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmont (1999)zbMATHGoogle Scholar
- 4.Bühlmann, P., van de Geer, S.: Statistics for High-dimensional Data. Springer Series in Statistics. Springer, Berlin (2011)CrossRefGoogle Scholar
- 5.Bickel, B.J., Ritov, Y., Tsybakov, A.B.: Simultaneous analysis of LASSO and Dantzig selector. Ann. Stat>
**37**(4), 1705–1732 (2009)MathSciNetCrossRefGoogle Scholar - 6.Candès, E., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theory
**51**(12), 4203–4216 (2005)MathSciNetCrossRefGoogle Scholar - 7.Candès, E., Tao, T.: Near optimal signal recovery from random projections: universal encoding strategies. IEEE Trans. Inf. Theory
**52**(12), 5406–5425 (2006)MathSciNetCrossRefGoogle Scholar - 8.Candes, E., Tao, T.: The Dantzig selector: statistical estimation when p is much larger than n. Ann. Stat.
**35**(6), 2313–2351 (2007)MathSciNetCrossRefGoogle Scholar - 9.Candès, E., Wakin, M., Boyd, S.: Enhancing sparsity by reweighted \(\ell _1\) minimization. J. Fourier Anal. Appl.
**14**(5), 877–905 (2008)MathSciNetCrossRefGoogle Scholar - 10.Dong, H., Ahn, M., Pang, J.S.: Structural properties of affine sparsity constraints. Math. Program. Ser. B (2018). https://doi.org/10.1007/s10107-018-1283-3 MathSciNetCrossRefGoogle Scholar
- 11.Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc.
**96**(456), 1348–1360 (2001)MathSciNetCrossRefGoogle Scholar - 12.Fan, J., Lv, J.: Nonconcave penalized likelihood with NP-dimensionality. IEEE Trans. Inf. Theory
**57**(8), 5467–5484 (2011)MathSciNetCrossRefGoogle Scholar - 13.Hastie, T., Tibshirani, R., Wainwright, M.: Statistical Learning with Sparsity: The Lasso and Generalizations. CRC Press/Taylor & Francis Group, Boca Raton (2015)CrossRefGoogle Scholar
- 14.Knight, K., Fu, W.: Asymptotics for lasso-type estimators. Ann. Stat.
**28**(5), 1356–1378 (2000)MathSciNetCrossRefGoogle Scholar - 15.Le Thi, H.A., Pham, D.T.: The DC programming and DCA revised with DC models of real world nonconvex optimization problems. Ann. Oper. Res.
**133**, 25–46 (2005)zbMATHGoogle Scholar - 16.Le Thi, H.A., Pham, D.T., Vo, X.T.: DC approximation approaches for sparse optimization. Eur. J. Oper. Res.
**244**, 26–46 (2015)MathSciNetCrossRefGoogle Scholar - 17.Loh, P., Wainwright, M.: Regularized M-estimators with nonconvexity: statistical and algorithmic theory for local optima. J. Mach. Learn. Res.
**16**, 559–616 (2015)MathSciNetzbMATHGoogle Scholar - 18.Loh, P., Wainwright, M.: Support recovery without incoherence: a case for nonconvex regularization. Ann. Stat.
**45**(6), 2455–2482 (2017)MathSciNetCrossRefGoogle Scholar - 19.Lou, Y., Yin, P., Xin, J.: Point source super-resolution via nonconvex L1 based methods. J. Sci. Comput.
**68**(3), 1082–1100 (2016)MathSciNetCrossRefGoogle Scholar - 20.Lu, S., Liu, Y., Yin, L., Zhang, K.: Confidence intervals and regions for the lasso by using stochastic variational inequality techniques in optimization. J. Roy. Stat. Soc. B
**79**(2), 589–611 (2017)MathSciNetCrossRefGoogle Scholar - 21.Negahban, S., Ravikumar, P., Wainwright, M., Yu, B.: A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers. Stat. Sci.
**27**(4), 538–557 (2012)MathSciNetCrossRefGoogle Scholar - 22.Nikolova, M.: Local strong homogeneity of a regularized estimator. SIAM J. Appl. Math.
**61**(2), 633–658 (2000)MathSciNetCrossRefGoogle Scholar - 23.Pang, J.S., Razaviyayn, M., Alvarado, A.: Computing B-stationary points of nonsmooth DC programs. Mathe. Oper. Res.
**42**(1), 95–118 (2017)MathSciNetCrossRefGoogle Scholar - 24.Pang, J.S., Tao, M.: Decomposition methods for computing directional stationary solutions of a class of nonsmooth nonconvex optimization problems. SIAM J. Optim.
**28**(2), 1640–1669 (2018)MathSciNetCrossRefGoogle Scholar - 25.Shapiro, A., Dentcheva, D., Ruszczynski, A.: Lectures on Stochastic Programming: Modeling and Theory. SIAM Publications. Philadelphia (2009)Google Scholar
- 26.Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc.
**58**(1), 267–288 (1996)MathSciNetzbMATHGoogle Scholar - 27.Yin, P., Lou, Y., He, Q., Xin, J.: Minimization of L1–L2 for compressed sensing. SIAM J. Sci. Comput.
**37**(1), 536–563 (2015)MathSciNetCrossRefGoogle Scholar - 28.Yin, W., Osher, S., Goldfarb, D., Darbon, J.: Bregman iterative algorithms for \(\text{ L }_1\)-minimization with applications to compressed Sensing. SIAM J. Imag. Sci.
**1**(1), 143–168 (2008)CrossRefGoogle Scholar - 29.Zhang, C.: Nearly unbiased variable selection under Minimax Concave Penalty. Ann. Stat.
**38**(2), 894–942 (2010)MathSciNetCrossRefGoogle Scholar - 30.Zhang, S., Xin, J.: Minimization of transformed \(\text{ L }_1\) penalty: theory, difference of convex function algorithm, and robust application in compressed sensing. Mathe. Programm. Ser. B
**169**(1), 307–336 (2018)MathSciNetCrossRefGoogle Scholar - 31.Zhao, P., Yu, B.: On model selection consistency of LASSO. J. Mach. Learn. Res.
**7**, 2541–2563 (2006)MathSciNetzbMATHGoogle Scholar - 32.Zou, H.: The adaptive Lasso and its oracle properties. J. Am. Stat. Assoc.
**101**(476), 1418–1429 (2006)MathSciNetCrossRefGoogle Scholar