Skip to main content

Constrained Optimization and Duality

  • Chapter
  • First Online:
Linear Algebra and Optimization for Machine Learning
  • 10k Accesses

Abstract

In many machine learning settings, such as nonnegative regression and box regression, the optimization variables are constrained. Therefore, one needs to find an optimal solution only over the region of the optimization space that satisfies these constraints. This region is referred to as the feasible region in optimization parlance. The straightforward use of a gradient-descent procedure does not work, because an unconstrained step might move the optimization variables outside the feasible region of the optimization problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 64.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The default definition of projection matrix (cf. Equation 2.17) always projects in the span of the columns of A, which is a column-wise projection matrix. Here, we project in the span of the rows of A, and therefore the formula of Equation 2.17 has been modified by transposing A.

  2. 2.

    As discussed in the previous section, this situation also arose with the hinge-loss SVM when the constraint C − α i − γ i = 0 contains only dual variables. In that case, the constraint C − α i − γ i = 0 was implicitly included in the formulation by using it to eliminate γ i from the dual.

  3. 3.

    Since the logarithm is concave, we know that:

    $$\displaystyle \begin{aligned} \mbox{log}[\lambda f_i(\overline{w}_1) + (1 - \lambda) f_i(\overline{w}_2)] \geq \lambda \mbox{log}[f_i(\overline{w}_1)]+ (1- \lambda) \mbox{log}[f_i(\overline{w}_2)] {} \end{aligned} $$
    (6.18)

    At the same time, we know that \(f_i(\lambda \overline {w}_1 + (1- \lambda ) \overline {w}_2) \geq \lambda f_i(\overline {w}_1) + (1- \lambda ) f_i( \overline {w}_2) \) because f i(â‹…) is concave. Since, the logarithm is an increasing function, we can take the logarithm of both sides to show the result that \( \mbox{log}[f_i(\lambda \overline {w}_1 + (1- \lambda ) \overline {w}_2)] \geq \mbox{log}[\lambda f_i(\overline {w}_1) + (1- \lambda ) f_i( \overline {w}_2)]\). Combining this inequality with Equation 6.18 using transitivity, we can show that \(\mbox{log}[f_i(\lambda \overline {w}_1 + (1- \lambda ) \overline {w}_2)] \geq \lambda \mbox{log}[ f_i(\overline {w}_1)] + (1 - \lambda ) \mbox{log}[f_i(\overline {w}_2)]\). In other words, log(f i(â‹…)) is concave. More generally, we just went through all the steps required to show that the composition g(f(â‹…)) of two concave functions is concave as long as g(â‹…) is non-decreasing. Closely related results are available in Lemma 4.3.2.

References

  1. D. Bertsimas and J. Tsitsiklis. Introduction to linear optimization. Athena Scientific, 1997.

    Google Scholar 

  2. P. Boggs and J. Tolle. Sequential quadratic programming. Acta Numerica, 4, pp. 1–151, 1995.

    Article  MathSciNet  Google Scholar 

  3. O. Chapelle. Training a support vector machine in the primal. Neural Computation, 19(5), pp. 1155–1178, 2007.

    Article  MathSciNet  Google Scholar 

  4. C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20(3), pp. 273–297, 1995.

    MATH  Google Scholar 

  5. N. Cristianini, and J. Shawe-Taylor. An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, 2000.

    Book  Google Scholar 

  6. D. Du and P. Pardalos (Eds). Minimax and applications, Springer, 2013.

    Google Scholar 

  7. C. Hsieh, K. Chang, C. Lin, S. S. Keerthi, and S. Sundararajan. A dual coordinate descent method for large-scale linear SVM. ICML, pp. 408–415, 2008.

    Google Scholar 

  8. T. Jaakkola, and D. Haussler. Probabilistic kernel regression models. AISTATS, 1999.

    Google Scholar 

  9. T. Minka. A comparison of numerical optimizers for logistic regression. Unpublished Draft, 2003.

    Google Scholar 

  10. J. Nocedal and S. Wright. Numerical optimization. Springer, 2006.

    MATH  Google Scholar 

  11. J. C. Platt. Sequential minimal optimization: A fast algorithm for training support vector machines. Advances in Kernel Method: Support Vector Learning, MIT Press, pp. 85–208, 1998.

    Google Scholar 

  12. H. Yu, F. Huang, and C. J. Lin. Dual coordinate descent methods for logistic regression and maximum entropy models. Machine Learning, 85(1–2), pp. 41–75, 2011.

    Article  MathSciNet  Google Scholar 

  13. T. Zhang. On the dual formulation of regularized linear systems with convex risks. Machine Learning, 46, 1–3, pp. 81–129, 2002.

    Google Scholar 

  14. J. Zhu and T. Hastie. Kernel logistic regression and the import vector machine. Advances in neural information processing systems, 2002.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Aggarwal, C.C. (2020). Constrained Optimization and Duality. In: Linear Algebra and Optimization for Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-40344-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-40344-7_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-40343-0

  • Online ISBN: 978-3-030-40344-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics