Invexity Preserving Transformations for Projection Free Optimization with Sparsity Inducing Non-convex Constraints

  • Sebastian Mathias KellerEmail author
  • Damian Murezzan
  • Volker Roth
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11269)


Forward stagewise and Frank Wolfe are popular gradient based projection free optimization algorithms which both require convex constraints. We propose a method to extend the applicability of these algorithms to problems of the form \(\min _x f(x) \quad s.t. \quad g(x) \le \kappa \) where f(x) is an invex (Invexity is a generalization of convexity and ensures that all local optima are also global optima.) objective function and g(x) is a non-convex constraint. We provide a theorem which defines a class of monotone component-wise transformation functions \(x_i = h(z_i)\). These transformations lead to a convex constraint function \(G(z) = g(h(z))\). Assuming invexity of the original function f(x) that same transformation \(x_i = h(z_i)\) will lead to a transformed objective function \(F(z) = f(h(z))\) which is also invex. For algorithms that rely on a non-zero gradient \(\nabla F\) to produce new update steps invexity ensures that these algorithms will move forward as long as a descent direction exists.



This project is supported by the the Swiss National Science Foundation project CR32I2 159682.

Supplementary material

480455_1_En_47_MOESM1_ESM.pdf (290 kb)
Supplementary material 1 (pdf 289 KB)


  1. 1.
    Ben-Israel, A., Mond, B.: What is invexity? J. Aust. Math. Soc. Ser. B. Appl. Math. 28, 1–9 (1986)MathSciNetzbMATHCrossRefGoogle Scholar
  2. 2.
    Chechik, G., Globerson, A., Tishby, N., Weiss, Y.: Information bottleneck for Gaussian variables. J. Mach. Learn. Res. 6(1), 165–188 (2005)MathSciNetzbMATHGoogle Scholar
  3. 3.
    Dinuzzo, F., Ong, C.S., Pillonetto, G., Gehler, P.V.: Learning output kernels with block coordinate descent. In: Proceedings of the 28th International Conference on Machine Learning (ICML-2011), pp. 49–56 (2011)Google Scholar
  4. 4.
    Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001)MathSciNetzbMATHCrossRefGoogle Scholar
  5. 5.
    Frank, M., Wolfe, P.: An algorithm for quadratic programming. Naval Res. Logistics (NRL) 3(1–2), 95–110 (1956)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Friedman, J.H.: Fast sparse regression and classification. Int. J. Forecast. 28(3), 722–738 (2012)CrossRefGoogle Scholar
  7. 7.
    Gasso, G., Rakotomamonjy, A., Canu, S.: Solving non-convex lasso type problems with dc programming. In: IEEE Workshop on Machine Learning for Signal Processing, MLSP 2008, pp. 450–455. IEEE (2008)Google Scholar
  8. 8.
    Giorgi, G.: On first order sufficient conditions for constrained optima. In: Maruyama, T., Takahashi, W. (eds.) Nonlinear and Convex Analysis in Economic Theory, pp. 53–66. Springer, Heidelberg (1995). Scholar
  9. 9.
    Giorgi, G.: On some generalizations of preinvex functions 49 (2008)Google Scholar
  10. 10.
    Gorodnitsky, I.F., Rao, B.D.: Sparse signal reconstruction from limited data using focuss: a re-weighted minimum norm algorithm. IEEE Trans. Signal Process. 45(3), 600–616 (1997)CrossRefGoogle Scholar
  11. 11.
    Hastie, T., Taylor, J., Tibshirani, R., Walther, G.: Forward stagewise regression and the monotone lasso. Electron. J. Stat. 1, 1–29 (2007). Scholar
  12. 12.
    Jaggi, M.: Revisiting Frank-Wolfe: projection-free sparse convex optimization. In: ICML, vol. 1, pp. 427–435 (2013)Google Scholar
  13. 13.
    Lanza, A., Morigi, S., Sgallari, F.: Convex image denoising via non-convex regularization. In: Aujol, J.-F., Nikolova, M., Papadakis, N. (eds.) SSVM 2015. LNCS, vol. 9087, pp. 666–677. Springer, Cham (2015). Scholar
  14. 14.
    Li, G., Yan, Z., Wang, J.: A one-layer recurrent neural network for constrained nonsmooth invex optimization. Neural Netw. 50, 79–89 (2014)zbMATHCrossRefGoogle Scholar
  15. 15.
    Li, X., Zhao, T., Zhang, T., Liu, H.: The picasso package for nonconvex regularized M-estimation in high dimensions in R. Technical report (2015)Google Scholar
  16. 16.
    Mazumder, R., Friedman, J.H., Hastie, T.: SparseNet: coordinate descent with nonconvex penalties. J. Am. Stat. Assoc. 106(495), 1125–1138 (2011)MathSciNetzbMATHCrossRefGoogle Scholar
  17. 17.
    Mishra, S., Giorgi, G.: Invexity and Optimization. Nonconvex Optimization and Its Applications. Springer, Heidelberg (2008). Scholar
  18. 18.
    Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  19. 19.
    Rey, M., Fuchs, T., Roth, V.: Sparse meta-Gaussian information bottleneck. In: Proceedings of the 31st International Conference on Machine Learning (ICML-2014), pp. 910–918 (2014)Google Scholar
  20. 20.
    Rey, M., Roth, V.: Meta-Gaussian information bottleneck. In: Advances in Neural Information Processing Systems-NIPS 25 (2012)Google Scholar
  21. 21.
    Tibshirani, R.J.: A general framework for fast stagewise algorithms. J. Mach. Learn. Res. 16, 2543–2588 (2015)MathSciNetzbMATHGoogle Scholar
  22. 22.
    Tishby, N., Pereira, F.C., Bialek, W.: The information bottleneck method. arXiv preprint physics/0004057 (2000)
  23. 23.
    Zhang, C.H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)MathSciNetzbMATHCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.University of BaselBaselSwitzerland

Personalised recommendations