Skip to main content
Log in

Minimizing a sum of clipped convex functions

  • Original Paper
  • Published:
Optimization Letters Aims and scope Submit manuscript

Abstract

We consider the problem of minimizing a sum of clipped convex functions. Applications of this problem include clipped empirical risk minimization and clipped control. While the problem of minimizing the sum of clipped convex functions is NP-hard, we present some heuristics for approximately solving instances of these problems. These heuristics can be used to find good, if not global, solutions, and appear to work well in practice. We also describe an alternative formulation, based on the perspective transformation, that makes the problem amenable to mixed-integer convex programming and yields computationally tractable lower bounds. We illustrate our heuristic methods by applying them to various examples and use the perspective transformation to certify that the solutions are relatively close to the global optimum. This paper is accompanied by an open-source implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. If \(0 \not \in \mathop {{\mathbf{dom}}}f\), replace \(\gamma f_0(x / \gamma )\) with \(\gamma f_0(y + x / \gamma )\) for any \(y \in \mathop {{\mathbf{dom}}}f\). See [19, Thm. 8.3] for more details.

References

  1. Agrawal, A., Amos, B., Barratt, S., Boyd, S., Diamond, S., Kolter, J.Z.: Differentiable convex optimization layers. In: Advances in Neural Information Processing Systems, pp. 9558–9570 (2019)

  2. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)

    Book  MATH  Google Scholar 

  3. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)

    Article  MATH  Google Scholar 

  4. Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)

    Article  MATH  Google Scholar 

  5. Glowinski, R., Marroco, A.: Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de dirichlet non linéaires. ESAIM: Math. Model. Numer. Anal. 9(R2), 41–76 (1975)

    MATH  Google Scholar 

  6. Grant, M., Boyd, S.: Graph implementations for non smooth convex programs. In: Recent Advances in Learning and Control, pp. 95–110. Springer, London (2008)

  7. Günlük, O., Linderoth, J.: Perspective reformulations of mixed integer nonlinear programs with indicator variables. Math. Program. 124(1–2), 183–205 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  8. Günlük, O., Linderoth, J.: Perspective reformulation and applications. In: Mixed Integer Nonlinear Programming, pp. 61–89. Springer, New York (2012)

  9. Huber, P.: Robust regression: asymptotics, conjectures and monte carlo. Ann. Stat. 1(5), 799–821 (1973)

    Article  MathSciNet  MATH  Google Scholar 

  10. Huber, P., Ronchetti, E.: Robust Statistics. Wiley, London (2009)

    Book  MATH  Google Scholar 

  11. Karp, R.: Reducibility among combinatorial problems. In: Complexity of Computer Computations. Springer, Boston (1972)

  12. Lan, G., Hou, C., Yi, D.: Robust feature selection via simultaneous capped \(\ell \)2-norm and \(\ell \)2, 1-norm minimization. In: IEEE International Conference on Big Data Analysis, pp. 1–5 (2016)

  13. Lipp, T., Boyd, S.: Variations and extension of the convex-concave procedure. Optim. Eng. 17(2), 263–287 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  14. Liu, D., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45(1–3), 503–528 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  15. Liu, T., Jiang, H.: Minimizing sum of truncated convex functions and its applications. J. Comput. Graph. Stat. 28(1), 1–10 (2019)

    Article  MathSciNet  Google Scholar 

  16. Moehle, N., Boyd, S.: A perspective-based convex relaxation for switched-affine optimal control. Syst. Control Lett. 86, 34–40 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  17. Ong, C., An, L.: Learning sparse classifiers with difference of convex functions algorithms. Optim. Methods Softw. 28(4), 830–854 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  18. Portilla, J., Tristan-Vega, A., Selesnick, I.: Efficient and robust image restoration using multiple-feature \(\ell \)2-relaxed sparse analysis priors. IEEE Trans. Image Process. 24(12), 5046–5059 (2015)

    MATH  Google Scholar 

  19. Rockafellar, T.: Convex Anal. Princeton University Press, Princeton (1970)

    Book  Google Scholar 

  20. Safari, A.: An e-E-insensitive support vector regression machine. Comput. Stat. 29(6), 1447–1468 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  21. She, Y., Owen, A.: Outlier detection using nonconvex penalized regression. J. Am. Stat. Assoc. 106(494), 626–639 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  22. Sun, Q., Xiang, S., Ye, J.: Robust principal component analysis via capped norms. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 311–319. (2013)

  23. Suzumura, S., Ogawa, K., Sugiyama, M., Takeuchi, I.: Outlier path: A homotopy algorithm for robust SVM. In: International Conference on Machine Learning, pp. 1098–1106 (2014)

  24. Tao, P., An, L.: Convex analysis approach to DC programming: Theory, algorithms and applications. Acta Math. Vietnam. 22(1), 289–355 (1997)

    MathSciNet  MATH  Google Scholar 

  25. Torr, P., Zisserman, A.: Robust computation and parametrization of multiple view relations. In: Sixth International Conference on Computer Vision , pp. 727–732 (1998)

  26. Xu, G., Hu, B.G., Principe, J.: Robust C-loss kernel classifiers. IEEE Trans. Neural Netw. Learn. Syst. 29(3), 510–522 (2016)

    Article  MathSciNet  Google Scholar 

  27. Yu, Y.l., Yang, M., Xu, L., White, M., Schuurmans, D.: Relaxed clipping: A global training method for robust regression and classification. In: Advances in Neural Information Processing Systems, pp. 2532–2540 (2010)

  28. Yuille, A., Rangarajan, A.: The concave-convex procedure. Neural Comput. 15(4), 915–936 (2003)

    Article  MATH  Google Scholar 

  29. Zhang, T.: Multi-stage convex relaxation for learning with sparse regularization. In: Advances in Neural Information Processing Systems, pp. 1929–1936 (2009)

  30. Zhang, T.: Analysis of multi-stage convex relaxation for sparse regularization. J. Mach. Learn. Res. 11(Mar), 1081–1107 (2010)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

S. Barratt is supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE-1656518.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shane Barratt.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Difference of convex formulation

In this section we make the observation that (1) can be expressed as a difference of convex (DC) programming problem.

Let \(h_i(x) = \max (f_i(x) - \alpha _i, 0)\). This (convex) function measures how far \(f_i(x)\) is above \(\alpha _i\). We can express the ith term in the sum as

$$\begin{aligned} \min \{f_i(x), \alpha _i\} = f_i(x) - h_i(x), \end{aligned}$$

since when \(f_i(x) \le \alpha _i\), we have \(h_i(x)=0\), and when \(f_i(x) > \alpha \), we have \(h_i(x)=f_i(x)-\alpha _i\). Since \(f_i\) and \(h_i\) are convex, (1) can be expressed as the DC programming problem

$$\begin{aligned} \begin{array}{ll} \text{ minimize }&f_0(x) + \displaystyle \sum _{i=1}^m f_i(x) - \displaystyle \sum _{i=1}^m h_i(x), \end{array} \end{aligned}$$
(16)

with variable x. We can apply then well-known algorithms like the convex-concave procedure [24, 28] to (approximately) solve (16).

Appendix B: Minimal convex extension

If we replace each \(f_i\) with any function \({\tilde{f}}_i\) such that \({\tilde{f}}_i(x) = f_i(x)\) when \(f_i(x) \le \alpha _i\), we get an equivalent problem. One such \({\tilde{f}}_i\) is the minimal convex extension of \(f_i\), which is given by

$$\begin{aligned} {\tilde{f}}_i(x) {:}{=}\sup \{f_i(z) + g^T (x-z) \mid g \in \partial f_i(z), f_i(z) \le \alpha _i, z\in {\text{ R }}^n\}. \end{aligned}$$

In general, the minimal convex extension of a function is often hard to compute, but it can be represented analytically in some (important) special cases. For example, if \(f_i(x)=(a^T x - b)^2\), the minimal convex extension is the Huber penalty function, or

$$\begin{aligned} {\tilde{f}}_i(x) = {\left\{ \begin{array}{ll} (a^Tx - b)^2 &{} |a^Tx-b| \le \alpha _i, \\ \alpha _i (2|a^Tx-b| - \alpha _i) &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$

Using the minimal convex extension leads to an equivalent problem, but, depending on the algorithm, replacing \(f_i\) with \({\tilde{f}}_i\) can lead to better numerical performance.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Barratt, S., Angeris, G. & Boyd, S. Minimizing a sum of clipped convex functions. Optim Lett 14, 2443–2459 (2020). https://doi.org/10.1007/s11590-020-01565-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11590-020-01565-4

Keywords

Navigation