Minimizing a sum of clipped convex functions

Barratt, Shane; Angeris, Guillermo; Boyd, Stephen

doi:10.1007/s11590-020-01565-4

Minimizing a sum of clipped convex functions

Original Paper
Published: 24 March 2020

Volume 14, pages 2443–2459, (2020)
Cite this article

Optimization Letters Aims and scope Submit manuscript

Shane Barratt¹,
Guillermo Angeris¹ &
Stephen Boyd¹

455 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

We consider the problem of minimizing a sum of clipped convex functions. Applications of this problem include clipped empirical risk minimization and clipped control. While the problem of minimizing the sum of clipped convex functions is NP-hard, we present some heuristics for approximately solving instances of these problems. These heuristics can be used to find good, if not global, solutions, and appear to work well in practice. We also describe an alternative formulation, based on the perspective transformation, that makes the problem amenable to mixed-integer convex programming and yields computationally tractable lower bounds. We illustrate our heuristic methods by applying them to various examples and use the perspective transformation to certify that the solutions are relatively close to the global optimum. This paper is accompanied by an open-source implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Convexity with respect to families of sections and lines and their application in optimization

Article 24 June 2015

Pál Burai

On the pervasiveness of difference-convexity in optimization and statistics

Article 29 May 2018

Maher Nouiehed, Jong-Shi Pang & Meisam Razaviyayn

Stochastic linear programming with a distortion risk constraint

Article 24 July 2014

Karl Mosler & Pavel Bazovkin

Notes

If $0 \not \in \mathop {{\mathbf{dom}}}f$, replace $\gamma f_0(x / \gamma )$ with $\gamma f_0(y + x / \gamma )$ for any $y \in \mathop {{\mathbf{dom}}}f$. See [19, Thm. 8.3] for more details.

References

Agrawal, A., Amos, B., Barratt, S., Boyd, S., Diamond, S., Kolter, J.Z.: Differentiable convex optimization layers. In: Advances in Neural Information Processing Systems, pp. 9558–9570 (2019)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Book MATH Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Article MATH Google Scholar
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)
Article MATH Google Scholar
Glowinski, R., Marroco, A.: Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de dirichlet non linéaires. ESAIM: Math. Model. Numer. Anal. 9(R2), 41–76 (1975)
MATH Google Scholar
Grant, M., Boyd, S.: Graph implementations for non smooth convex programs. In: Recent Advances in Learning and Control, pp. 95–110. Springer, London (2008)
Günlük, O., Linderoth, J.: Perspective reformulations of mixed integer nonlinear programs with indicator variables. Math. Program. 124(1–2), 183–205 (2010)
Article MathSciNet MATH Google Scholar
Günlük, O., Linderoth, J.: Perspective reformulation and applications. In: Mixed Integer Nonlinear Programming, pp. 61–89. Springer, New York (2012)
Huber, P.: Robust regression: asymptotics, conjectures and monte carlo. Ann. Stat. 1(5), 799–821 (1973)
Article MathSciNet MATH Google Scholar
Huber, P., Ronchetti, E.: Robust Statistics. Wiley, London (2009)
Book MATH Google Scholar
Karp, R.: Reducibility among combinatorial problems. In: Complexity of Computer Computations. Springer, Boston (1972)
Lan, G., Hou, C., Yi, D.: Robust feature selection via simultaneous capped $\ell $2-norm and $\ell $2, 1-norm minimization. In: IEEE International Conference on Big Data Analysis, pp. 1–5 (2016)
Lipp, T., Boyd, S.: Variations and extension of the convex-concave procedure. Optim. Eng. 17(2), 263–287 (2016)
Article MathSciNet MATH Google Scholar
Liu, D., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45(1–3), 503–528 (1989)
Article MathSciNet MATH Google Scholar
Liu, T., Jiang, H.: Minimizing sum of truncated convex functions and its applications. J. Comput. Graph. Stat. 28(1), 1–10 (2019)
Article MathSciNet Google Scholar
Moehle, N., Boyd, S.: A perspective-based convex relaxation for switched-affine optimal control. Syst. Control Lett. 86, 34–40 (2015)
Article MathSciNet MATH Google Scholar
Ong, C., An, L.: Learning sparse classifiers with difference of convex functions algorithms. Optim. Methods Softw. 28(4), 830–854 (2013)
Article MathSciNet MATH Google Scholar
Portilla, J., Tristan-Vega, A., Selesnick, I.: Efficient and robust image restoration using multiple-feature $\ell $2-relaxed sparse analysis priors. IEEE Trans. Image Process. 24(12), 5046–5059 (2015)
MATH Google Scholar
Rockafellar, T.: Convex Anal. Princeton University Press, Princeton (1970)
Book Google Scholar
Safari, A.: An e-E-insensitive support vector regression machine. Comput. Stat. 29(6), 1447–1468 (2014)
Article MathSciNet MATH Google Scholar
She, Y., Owen, A.: Outlier detection using nonconvex penalized regression. J. Am. Stat. Assoc. 106(494), 626–639 (2011)
Article MathSciNet MATH Google Scholar
Sun, Q., Xiang, S., Ye, J.: Robust principal component analysis via capped norms. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 311–319. (2013)
Suzumura, S., Ogawa, K., Sugiyama, M., Takeuchi, I.: Outlier path: A homotopy algorithm for robust SVM. In: International Conference on Machine Learning, pp. 1098–1106 (2014)
Tao, P., An, L.: Convex analysis approach to DC programming: Theory, algorithms and applications. Acta Math. Vietnam. 22(1), 289–355 (1997)
MathSciNet MATH Google Scholar
Torr, P., Zisserman, A.: Robust computation and parametrization of multiple view relations. In: Sixth International Conference on Computer Vision , pp. 727–732 (1998)
Xu, G., Hu, B.G., Principe, J.: Robust C-loss kernel classifiers. IEEE Trans. Neural Netw. Learn. Syst. 29(3), 510–522 (2016)
Article MathSciNet Google Scholar
Yu, Y.l., Yang, M., Xu, L., White, M., Schuurmans, D.: Relaxed clipping: A global training method for robust regression and classification. In: Advances in Neural Information Processing Systems, pp. 2532–2540 (2010)
Yuille, A., Rangarajan, A.: The concave-convex procedure. Neural Comput. 15(4), 915–936 (2003)
Article MATH Google Scholar
Zhang, T.: Multi-stage convex relaxation for learning with sparse regularization. In: Advances in Neural Information Processing Systems, pp. 1929–1936 (2009)
Zhang, T.: Analysis of multi-stage convex relaxation for sparse regularization. J. Mach. Learn. Res. 11(Mar), 1081–1107 (2010)
MathSciNet MATH Google Scholar

Download references

Acknowledgements

S. Barratt is supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE-1656518.

Author information

Authors and Affiliations

Department of Electrical Engineering, Stanford University, Stanford, USA
Shane Barratt, Guillermo Angeris & Stephen Boyd

Authors

Shane Barratt
View author publications
You can also search for this author in PubMed Google Scholar
Guillermo Angeris
View author publications
You can also search for this author in PubMed Google Scholar
Stephen Boyd
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shane Barratt.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Difference of convex formulation

In this section we make the observation that (1) can be expressed as a difference of convex (DC) programming problem.

Let $h_i(x) = \max (f_i(x) - \alpha _i, 0)$. This (convex) function measures how far $f_i(x)$ is above $\alpha _i$. We can express the ith term in the sum as

$$\begin{aligned} \min \{f_i(x), \alpha _i\} = f_i(x) - h_i(x), \end{aligned}$$

since when $f_i(x) \le \alpha _i$, we have $h_i(x)=0$, and when $f_i(x) > \alpha $, we have $h_i(x)=f_i(x)-\alpha _i$. Since $f_i$ and $h_i$ are convex, (1) can be expressed as the DC programming problem

$$\begin{aligned} \begin{array}{ll} \text{ minimize }&f_0(x) + \displaystyle \sum _{i=1}^m f_i(x) - \displaystyle \sum _{i=1}^m h_i(x), \end{array} \end{aligned}$$

(16)

with variable x. We can apply then well-known algorithms like the convex-concave procedure [24, 28] to (approximately) solve (16).

Appendix B: Minimal convex extension

If we replace each $f_i$ with any function ${\tilde{f}}_i$ such that ${\tilde{f}}_i(x) = f_i(x)$ when $f_i(x) \le \alpha _i$, we get an equivalent problem. One such ${\tilde{f}}_i$ is the minimal convex extension of $f_i$, which is given by

$$\begin{aligned} {\tilde{f}}_i(x) {:}{=}\sup \{f_i(z) + g^T (x-z) \mid g \in \partial f_i(z), f_i(z) \le \alpha _i, z\in {\text{ R }}^n\}. \end{aligned}$$

In general, the minimal convex extension of a function is often hard to compute, but it can be represented analytically in some (important) special cases. For example, if $f_i(x)=(a^T x - b)^2$, the minimal convex extension is the Huber penalty function, or

$$\begin{aligned} {\tilde{f}}_i(x) = {\left\{ \begin{array}{ll} (a^Tx - b)^2 &{} |a^Tx-b| \le \alpha _i, \\ \alpha _i (2|a^Tx-b| - \alpha _i) &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$

Using the minimal convex extension leads to an equivalent problem, but, depending on the algorithm, replacing $f_i$ with ${\tilde{f}}_i$ can lead to better numerical performance.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Barratt, S., Angeris, G. & Boyd, S. Minimizing a sum of clipped convex functions. Optim Lett 14, 2443–2459 (2020). https://doi.org/10.1007/s11590-020-01565-4

Download citation

Received: 28 October 2019
Accepted: 28 February 2020
Published: 24 March 2020
Issue Date: November 2020
DOI: https://doi.org/10.1007/s11590-020-01565-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Minimizing a sum of clipped convex functions

Abstract

Access this article

Similar content being viewed by others

Convexity with respect to families of sections and lines and their application in optimization

On the pervasiveness of difference-convexity in optimization and statistics

Stochastic linear programming with a distortion risk constraint

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Appendices

Appendix A: Difference of convex formulation

Appendix B: Minimal convex extension

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Minimizing a sum of clipped convex functions

Abstract

Access this article

Similar content being viewed by others

Convexity with respect to families of sections and lines and their application in optimization

On the pervasiveness of difference-convexity in optimization and statistics

Stochastic linear programming with a distortion risk constraint

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Appendices

Appendix A: Difference of convex formulation

Appendix B: Minimal convex extension

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation