Constrained Optimization and Duality

Aggarwal, Charu C.

doi:10.1007/978-3-030-40344-7_6

Charu C. Aggarwal²

10k Accesses

Abstract

In many machine learning settings, such as nonnegative regression and box regression, the optimization variables are constrained. Therefore, one needs to find an optimal solution only over the region of the optimization space that satisfies these constraints. This region is referred to as the feasible region in optimization parlance. The straightforward use of a gradient-descent procedure does not work, because an unconstrained step might move the optimization variables outside the feasible region of the optimization problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Hardcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The default definition of projection matrix (cf. Equation 2.17) always projects in the span of the columns of A, which is a column-wise projection matrix. Here, we project in the span of the rows of A, and therefore the formula of Equation 2.17 has been modified by transposing A.
2.
As discussed in the previous section, this situation also arose with the hinge-loss SVM when the constraint C − α _i − γ _i = 0 contains only dual variables. In that case, the constraint C − α _i − γ _i = 0 was implicitly included in the formulation by using it to eliminate γ _i from the dual.
3.
Since the logarithm is concave, we know that:
$$\displaystyle \begin{aligned} \mbox{log}[\lambda f_i(\overline{w}_1) + (1 - \lambda) f_i(\overline{w}_2)] \geq \lambda \mbox{log}[f_i(\overline{w}_1)]+ (1- \lambda) \mbox{log}[f_i(\overline{w}_2)] {} \end{aligned} $$
(6.18)

At the same time, we know that $f_i(\lambda \overline {w}_1 + (1- \lambda ) \overline {w}_2) \geq \lambda f_i(\overline {w}_1) + (1- \lambda ) f_i( \overline {w}_2) $ because f _i(⋅) is concave. Since, the logarithm is an increasing function, we can take the logarithm of both sides to show the result that $ \mbox{log}[f_i(\lambda \overline {w}_1 + (1- \lambda ) \overline {w}_2)] \geq \mbox{log}[\lambda f_i(\overline {w}_1) + (1- \lambda ) f_i( \overline {w}_2)]$. Combining this inequality with Equation 6.18 using transitivity, we can show that $\mbox{log}[f_i(\lambda \overline {w}_1 + (1- \lambda ) \overline {w}_2)] \geq \lambda \mbox{log}[ f_i(\overline {w}_1)] + (1 - \lambda ) \mbox{log}[f_i(\overline {w}_2)]$. In other words, log(f _i(⋅)) is concave. More generally, we just went through all the steps required to show that the composition g(f(⋅)) of two concave functions is concave as long as g(⋅) is non-decreasing. Closely related results are available in Lemma 4.3.2.

References

D. Bertsimas and J. Tsitsiklis. Introduction to linear optimization. Athena Scientific, 1997.
Google Scholar
P. Boggs and J. Tolle. Sequential quadratic programming. Acta Numerica, 4, pp. 1–151, 1995.
Article MathSciNet Google Scholar
O. Chapelle. Training a support vector machine in the primal. Neural Computation, 19(5), pp. 1155–1178, 2007.
Article MathSciNet Google Scholar
C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20(3), pp. 273–297, 1995.
MATH Google Scholar
N. Cristianini, and J. Shawe-Taylor. An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, 2000.
Book Google Scholar
D. Du and P. Pardalos (Eds). Minimax and applications, Springer, 2013.
Google Scholar
C. Hsieh, K. Chang, C. Lin, S. S. Keerthi, and S. Sundararajan. A dual coordinate descent method for large-scale linear SVM. ICML, pp. 408–415, 2008.
Google Scholar
T. Jaakkola, and D. Haussler. Probabilistic kernel regression models. AISTATS, 1999.
Google Scholar
T. Minka. A comparison of numerical optimizers for logistic regression. Unpublished Draft, 2003.
Google Scholar
J. Nocedal and S. Wright. Numerical optimization. Springer, 2006.
MATH Google Scholar
J. C. Platt. Sequential minimal optimization: A fast algorithm for training support vector machines. Advances in Kernel Method: Support Vector Learning, MIT Press, pp. 85–208, 1998.
Google Scholar
H. Yu, F. Huang, and C. J. Lin. Dual coordinate descent methods for logistic regression and maximum entropy models. Machine Learning, 85(1–2), pp. 41–75, 2011.
Article MathSciNet Google Scholar
T. Zhang. On the dual formulation of regularized linear systems with convex risks. Machine Learning, 46, 1–3, pp. 81–129, 2002.
Google Scholar
J. Zhu and T. Hastie. Kernel logistic regression and the import vector machine. Advances in neural information processing systems, 2002.
Google Scholar

Download references

Author information

Authors and Affiliations

IBM T.J. Watson Research Center, Yorktown Heights, NY, USA
Charu C. Aggarwal (Distinguished Research Staff Member)

Authors

Charu C. Aggarwal
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Aggarwal, C.C. (2020). Constrained Optimization and Duality. In: Linear Algebra and Optimization for Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-40344-7_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-40344-7_6
Published: 13 May 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-40343-0
Online ISBN: 978-3-030-40344-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics