Skip to main content
Book cover

Optimization pp 313–339Cite as

Penalty and Barrier Methods

  • Chapter
  • First Online:
  • 12k Accesses

Part of the book series: Springer Texts in Statistics ((STS,volume 95))

Abstract

Penalties and barriers feature prominently in two areas of modern optimization theory. First, both devices are employed to solve constrained optimization problems [96, 183, 226]. The general idea is to replace hard constraints by penalties or barriers and then exploit the well-oiled machinery for solving unconstrained problems. Penalty methods operate on the exterior of the feasible region and barrier methods on the interior. The strength of a penalty or barrier is determined by a tuning constant. In classical penalty methods, a single global tuning constant is gradually sent to ∞; in barrier methods, it is gradually sent to 0. Nothing prevents one from assigning different tuning constants to different penalties or barriers in the same problem. Either strategy generates a sequence of solutions that converges in practice to the solution of the original constrained optimization problem.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Armstrong RD, Kung MT (1978) Algorithm AS 132: least absolute value estimates for a simple linear regression problem. Appl Stat 27:363–366

    Article  MATH  Google Scholar 

  2. Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  3. Bregman LM (1965) The method of successive projection for finding a common point of convex sets. Sov Math Dokl 6:688–692

    MATH  Google Scholar 

  4. Brinkhuis J, Tikhomirov V (2005) Optimization: insights and applications. Princeton University Press, Princeton

    MATH  Google Scholar 

  5. Candés EJ, Tao T (2007) The Danzig selector: statistical estimation when p is much larger than n. Ann Stat 35:2313–2351

    Article  MATH  Google Scholar 

  6. Candés EJ, Wakin M, Boyd S (2007) Enhancing sparsity by reweighted ℓ 1 minimization. J Fourier Anal Appl 14:877–905

    Article  Google Scholar 

  7. Censor Y, Zenios SA (1992) Proximal minimization with D-functions. J Optim Theor Appl 73:451–464

    Article  MathSciNet  MATH  Google Scholar 

  8. Chen J, Tan X (2009) Inference for multivariate normal mixtures. J Multivariate Anal 100:1367–1383

    Article  MathSciNet  MATH  Google Scholar 

  9. Chen SS, Donoho DL, Saunders MA (1998) Atomic decomposition by basis pursuit. SIAM J Sci Comput 20:33–61

    Article  MathSciNet  Google Scholar 

  10. Claerbout J, Muir F (1973) Robust modeling with erratic data. Geophysics 38:826–844

    Article  Google Scholar 

  11. Daubechies I, Defrise M, De Mol C (2004) An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Comm Pure Appl Math 57:1413–1457

    Article  MathSciNet  MATH  Google Scholar 

  12. de Leeuw J, Lange K (2009) Sharp quadratic majorization in one dimension. Comput Stat Data Anal 53:2471–2484

    Article  MATH  Google Scholar 

  13. Donoho D, Johnstone I (1994) Ideal spatial adaptation by wavelet shrinkage. Biometrika 81:425–455

    Article  MathSciNet  MATH  Google Scholar 

  14. Edgeworth FY (1887) On observations relating to several quantities. Hermathena 6:279–285

    Google Scholar 

  15. Edgeworth FY (1888) On a new method of reducing observations relating to several quantities. Phil Mag 25:184–191

    MATH  Google Scholar 

  16. Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32:407–499

    Article  MathSciNet  MATH  Google Scholar 

  17. Fang S-C, Puthenpura S (1993) Linear optimization and extensions: theory and algorithms. Prentice-Hall, Englewood Cliffs

    MATH  Google Scholar 

  18. Fazel M, Hindi M, Boyd S (2003) Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices. Proc Am Contr Conf 3:2156–2162

    Google Scholar 

  19. Forsgren A, Gill PE, Wright MH (2002) Interior point methods for nonlinear optimization. SIAM Rev 44:523–597

    Article  MathSciNet  Google Scholar 

  20. Friedman J, Hastie T, Tibshirani R (2007) Pathwise coordinate optimization. Ann Appl Stat 1:302–332

    Article  MathSciNet  MATH  Google Scholar 

  21. Friedman J, Hastie T, Tibshirani R (2009) Regularized paths for generalized linear models via coordinate descent. Technical Report, Department of Statistics, Stanford University

    Google Scholar 

  22. Fu WJ (1998) Penalized regressions: the bridge versus the lasso. J Comput Graph Stat 7:397–416

    Google Scholar 

  23. Groenen PJF, Nalbantov G, Bioch JC (2007) Nonlinear support vector machines through iterative majorization and I-splines. In: Lenz HJ, Decker R (eds) Studies in classification, data analysis, and knowledge organization. Springer, Heidelberg, pp 149–161

    Google Scholar 

  24. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York

    Book  MATH  Google Scholar 

  25. Hunter DR, Li R (2005) Variable selection using MM algorithms. Ann Stat 33:1617–1642

    Article  MathSciNet  MATH  Google Scholar 

  26. Lange K (1994) An adaptive barrier method for convex programming. Meth Appl Anal 1:392–402

    MATH  Google Scholar 

  27. Lange K, Wu T (2008) An MM algorithm for multicategory vertex discriminant analysis. J Comput Graph Stat 17:527–544

    Article  MathSciNet  Google Scholar 

  28. Lange K, Zhou H (2012) MM algorithms for geometric and signomial programming. Math Program, Series A, DOI 10.1007/s10107-012-0612-1

    Google Scholar 

  29. Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791

    Article  Google Scholar 

  30. Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. Adv Neural Inf Process Syst 13:556–562

    Google Scholar 

  31. Levina E, Rothman A, Zhu J (2008) Sparse estimation of large covariance matrices via a nested lasso penalty. Ann Appl Stat 2:245–263

    Article  MathSciNet  MATH  Google Scholar 

  32. Li Y, Arce GR (2004) A maximum likelihood approach to least absolute deviation regression. EURASIP J Appl Signal Process 2004:1762–1769

    Article  MathSciNet  Google Scholar 

  33. Luenberger DG (1984) Linear and nonlinear programming, 2nd edn. Addison-Wesley, Reading

    MATH  Google Scholar 

  34. McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions, 2nd edn. Wiley, Hoboken

    Book  MATH  Google Scholar 

  35. Nemirovski AS, Todd MJ (2008) Interior-point methods for optimization. Acta Numerica 17:191–234

    Article  MathSciNet  MATH  Google Scholar 

  36. Nocedal J, Wright S (2006) Numerical optimization, 2nd edn. Springer, New York

    MATH  Google Scholar 

  37. Park MY, Hastie T (2008) Penalized logistic regression for detecting gene interactions. Biostatistics 9:30–50

    Article  MATH  Google Scholar 

  38. Pauca VP, Piper J, Plemmons RJ (2006) Nonnegative matrix factorization for spectral data analysis. Linear Algebra Appl 416:29–47

    Article  MathSciNet  MATH  Google Scholar 

  39. Portnoy S, Koenker R (1997) The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute-error estimators. Stat Sci 12:279–300

    Article  MathSciNet  MATH  Google Scholar 

  40. Ruszczyński A (2006) Nonlinear optimization. Princeton University Press, Princeton

    MATH  Google Scholar 

  41. Santosa F, Symes WW (1986) Linear inversion of band-limited reflection seimograms. SIAM J Sci Stat Comput 7:1307–1330

    Article  MathSciNet  MATH  Google Scholar 

  42. Schölkopf B, Smola AJ (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT, Cambridge

    Google Scholar 

  43. Taylor H, Banks SC, McCoy JF (1979) Deconvolution with the ℓ 1 norm. Geophysics 44:39–52

    Article  Google Scholar 

  44. Teboulle M (1992) Entropic proximal mappings with applications to nonlinear programming. Math Oper Res 17:670–690

    Article  MathSciNet  MATH  Google Scholar 

  45. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc B 58:267–288

    MathSciNet  MATH  Google Scholar 

  46. Vapnik V (1995) The nature of statistical learning theory. Springer, New York

    MATH  Google Scholar 

  47. Wang L, Gordon MD, Zhu J (2006) Regularized least absolute deviations regression and an efficient algorithm for parameter tuning. In: Proceedings of the sixth international conference on data mining (ICDM’06). IEEE Computer Society, Washington, DC, pp 690–700

    Google Scholar 

  48. Wang S, Yehya N, Schadt EE, Wang H, Drake TA, Lusis AJ (2006) Genetic and genomic analysis of a fat mass trait with complex inheritance reveals marked sex specificity. PLoS Genet 2:148–159

    Article  Google Scholar 

  49. Weston J, Elisseeff A, Schölkopf B, Tipping M (2003) Use of the zero-norm with linear models and kernel methods. J Mach Learn Res 3:1439–1461

    MATH  Google Scholar 

  50. Wright MH (2005) The interior-point revolution in optimization: history, recent developments, and lasting consequences. Bull Am Math Soc 42:39–56

    Article  MATH  Google Scholar 

  51. Wu TT, Lange K (2008) Coordinate descent algorithms for lasso penalized regression. Ann Appl Stat 2:224–244

    Article  MathSciNet  MATH  Google Scholar 

  52. Wu TT, Lange K (2010) Multicategory vertex discriminant analysis for high-dimensional data. Ann Appl Stat 4:1698–1721

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this chapter

Cite this chapter

Lange, K. (2013). Penalty and Barrier Methods. In: Optimization. Springer Texts in Statistics, vol 95. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-5838-8_13

Download citation

Publish with us

Policies and ethics