Abstract
In many machine learning settings, such as nonnegative regression and box regression, the optimization variables are constrained. Therefore, one needs to find an optimal solution only over the region of the optimization space that satisfies these constraints. This region is referred to as the feasible region in optimization parlance. The straightforward use of a gradient-descent procedure does not work, because an unconstrained step might move the optimization variables outside the feasible region of the optimization problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
As discussed in the previous section, this situation also arose with the hinge-loss SVM when the constraint C − α i − γ i = 0 contains only dual variables. In that case, the constraint C − α i − γ i = 0 was implicitly included in the formulation by using it to eliminate γ i from the dual.
- 3.
Since the logarithm is concave, we know that:
$$\displaystyle \begin{aligned} \mbox{log}[\lambda f_i(\overline{w}_1) + (1 - \lambda) f_i(\overline{w}_2)] \geq \lambda \mbox{log}[f_i(\overline{w}_1)]+ (1- \lambda) \mbox{log}[f_i(\overline{w}_2)] {} \end{aligned} $$(6.18)At the same time, we know that \(f_i(\lambda \overline {w}_1 + (1- \lambda ) \overline {w}_2) \geq \lambda f_i(\overline {w}_1) + (1- \lambda ) f_i( \overline {w}_2) \) because f i(â‹…) is concave. Since, the logarithm is an increasing function, we can take the logarithm of both sides to show the result that \( \mbox{log}[f_i(\lambda \overline {w}_1 + (1- \lambda ) \overline {w}_2)] \geq \mbox{log}[\lambda f_i(\overline {w}_1) + (1- \lambda ) f_i( \overline {w}_2)]\). Combining this inequality with Equation 6.18 using transitivity, we can show that \(\mbox{log}[f_i(\lambda \overline {w}_1 + (1- \lambda ) \overline {w}_2)] \geq \lambda \mbox{log}[ f_i(\overline {w}_1)] + (1 - \lambda ) \mbox{log}[f_i(\overline {w}_2)]\). In other words, log(f i(â‹…)) is concave. More generally, we just went through all the steps required to show that the composition g(f(â‹…)) of two concave functions is concave as long as g(â‹…) is non-decreasing. Closely related results are available in Lemma 4.3.2.
References
D. Bertsimas and J. Tsitsiklis. Introduction to linear optimization. Athena Scientific, 1997.
P. Boggs and J. Tolle. Sequential quadratic programming. Acta Numerica, 4, pp. 1–151, 1995.
O. Chapelle. Training a support vector machine in the primal. Neural Computation, 19(5), pp. 1155–1178, 2007.
C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20(3), pp. 273–297, 1995.
N. Cristianini, and J. Shawe-Taylor. An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, 2000.
D. Du and P. Pardalos (Eds). Minimax and applications, Springer, 2013.
C. Hsieh, K. Chang, C. Lin, S. S. Keerthi, and S. Sundararajan. A dual coordinate descent method for large-scale linear SVM. ICML, pp. 408–415, 2008.
T. Jaakkola, and D. Haussler. Probabilistic kernel regression models. AISTATS, 1999.
T. Minka. A comparison of numerical optimizers for logistic regression. Unpublished Draft, 2003.
J. Nocedal and S. Wright. Numerical optimization. Springer, 2006.
J. C. Platt. Sequential minimal optimization: A fast algorithm for training support vector machines. Advances in Kernel Method: Support Vector Learning, MIT Press, pp. 85–208, 1998.
H. Yu, F. Huang, and C. J. Lin. Dual coordinate descent methods for logistic regression and maximum entropy models. Machine Learning, 85(1–2), pp. 41–75, 2011.
T. Zhang. On the dual formulation of regularized linear systems with convex risks. Machine Learning, 46, 1–3, pp. 81–129, 2002.
J. Zhu and T. Hastie. Kernel logistic regression and the import vector machine. Advances in neural information processing systems, 2002.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Aggarwal, C.C. (2020). Constrained Optimization and Duality. In: Linear Algebra and Optimization for Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-40344-7_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-40344-7_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-40343-0
Online ISBN: 978-3-030-40344-7
eBook Packages: Computer ScienceComputer Science (R0)