Penalty and Barrier Methods

Lange, Kenneth

doi:10.1007/978-1-4614-5838-8_13

Penalty and Barrier Methods

Kenneth Lange²

Chapter
First Online: 21 October 2012

12k Accesses

Part of the book series: Springer Texts in Statistics ((STS,volume 95))

Abstract

Penalties and barriers feature prominently in two areas of modern optimization theory. First, both devices are employed to solve constrained optimization problems [96, 183, 226]. The general idea is to replace hard constraints by penalties or barriers and then exploit the well-oiled machinery for solving unconstrained problems. Penalty methods operate on the exterior of the feasible region and barrier methods on the interior. The strength of a penalty or barrier is determined by a tuning constant. In classical penalty methods, a single global tuning constant is gradually sent to ∞; in barrier methods, it is gradually sent to 0. Nothing prevents one from assigning different tuning constants to different penalties or barriers in the same problem. Either strategy generates a sequence of solutions that converges in practice to the solution of the original constrained optimization problem.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Armstrong RD, Kung MT (1978) Algorithm AS 132: least absolute value estimates for a simple linear regression problem. Appl Stat 27:363–366
Article MATH Google Scholar
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge
MATH Google Scholar
Bregman LM (1965) The method of successive projection for finding a common point of convex sets. Sov Math Dokl 6:688–692
MATH Google Scholar
Brinkhuis J, Tikhomirov V (2005) Optimization: insights and applications. Princeton University Press, Princeton
MATH Google Scholar
Candés EJ, Tao T (2007) The Danzig selector: statistical estimation when p is much larger than n. Ann Stat 35:2313–2351
Article MATH Google Scholar
Candés EJ, Wakin M, Boyd S (2007) Enhancing sparsity by reweighted ℓ ₁ minimization. J Fourier Anal Appl 14:877–905
Article Google Scholar
Censor Y, Zenios SA (1992) Proximal minimization with D-functions. J Optim Theor Appl 73:451–464
Article MathSciNet MATH Google Scholar
Chen J, Tan X (2009) Inference for multivariate normal mixtures. J Multivariate Anal 100:1367–1383
Article MathSciNet MATH Google Scholar
Chen SS, Donoho DL, Saunders MA (1998) Atomic decomposition by basis pursuit. SIAM J Sci Comput 20:33–61
Article MathSciNet Google Scholar
Claerbout J, Muir F (1973) Robust modeling with erratic data. Geophysics 38:826–844
Article Google Scholar
Daubechies I, Defrise M, De Mol C (2004) An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Comm Pure Appl Math 57:1413–1457
Article MathSciNet MATH Google Scholar
de Leeuw J, Lange K (2009) Sharp quadratic majorization in one dimension. Comput Stat Data Anal 53:2471–2484
Article MATH Google Scholar
Donoho D, Johnstone I (1994) Ideal spatial adaptation by wavelet shrinkage. Biometrika 81:425–455
Article MathSciNet MATH Google Scholar
Edgeworth FY (1887) On observations relating to several quantities. Hermathena 6:279–285
Google Scholar
Edgeworth FY (1888) On a new method of reducing observations relating to several quantities. Phil Mag 25:184–191
MATH Google Scholar
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32:407–499
Article MathSciNet MATH Google Scholar
Fang S-C, Puthenpura S (1993) Linear optimization and extensions: theory and algorithms. Prentice-Hall, Englewood Cliffs
MATH Google Scholar
Fazel M, Hindi M, Boyd S (2003) Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices. Proc Am Contr Conf 3:2156–2162
Google Scholar
Forsgren A, Gill PE, Wright MH (2002) Interior point methods for nonlinear optimization. SIAM Rev 44:523–597
Article MathSciNet Google Scholar
Friedman J, Hastie T, Tibshirani R (2007) Pathwise coordinate optimization. Ann Appl Stat 1:302–332
Article MathSciNet MATH Google Scholar
Friedman J, Hastie T, Tibshirani R (2009) Regularized paths for generalized linear models via coordinate descent. Technical Report, Department of Statistics, Stanford University
Google Scholar
Fu WJ (1998) Penalized regressions: the bridge versus the lasso. J Comput Graph Stat 7:397–416
Google Scholar
Groenen PJF, Nalbantov G, Bioch JC (2007) Nonlinear support vector machines through iterative majorization and I-splines. In: Lenz HJ, Decker R (eds) Studies in classification, data analysis, and knowledge organization. Springer, Heidelberg, pp 149–161
Google Scholar
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York
Book MATH Google Scholar
Hunter DR, Li R (2005) Variable selection using MM algorithms. Ann Stat 33:1617–1642
Article MathSciNet MATH Google Scholar
Lange K (1994) An adaptive barrier method for convex programming. Meth Appl Anal 1:392–402
MATH Google Scholar
Lange K, Wu T (2008) An MM algorithm for multicategory vertex discriminant analysis. J Comput Graph Stat 17:527–544
Article MathSciNet Google Scholar
Lange K, Zhou H (2012) MM algorithms for geometric and signomial programming. Math Program, Series A, DOI 10.1007/s10107-012-0612-1
Google Scholar
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791
Article Google Scholar
Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. Adv Neural Inf Process Syst 13:556–562
Google Scholar
Levina E, Rothman A, Zhu J (2008) Sparse estimation of large covariance matrices via a nested lasso penalty. Ann Appl Stat 2:245–263
Article MathSciNet MATH Google Scholar
Li Y, Arce GR (2004) A maximum likelihood approach to least absolute deviation regression. EURASIP J Appl Signal Process 2004:1762–1769
Article MathSciNet Google Scholar
Luenberger DG (1984) Linear and nonlinear programming, 2nd edn. Addison-Wesley, Reading
MATH Google Scholar
McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions, 2nd edn. Wiley, Hoboken
Book MATH Google Scholar
Nemirovski AS, Todd MJ (2008) Interior-point methods for optimization. Acta Numerica 17:191–234
Article MathSciNet MATH Google Scholar
Nocedal J, Wright S (2006) Numerical optimization, 2nd edn. Springer, New York
MATH Google Scholar
Park MY, Hastie T (2008) Penalized logistic regression for detecting gene interactions. Biostatistics 9:30–50
Article MATH Google Scholar
Pauca VP, Piper J, Plemmons RJ (2006) Nonnegative matrix factorization for spectral data analysis. Linear Algebra Appl 416:29–47
Article MathSciNet MATH Google Scholar
Portnoy S, Koenker R (1997) The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute-error estimators. Stat Sci 12:279–300
Article MathSciNet MATH Google Scholar
Ruszczyński A (2006) Nonlinear optimization. Princeton University Press, Princeton
MATH Google Scholar
Santosa F, Symes WW (1986) Linear inversion of band-limited reflection seimograms. SIAM J Sci Stat Comput 7:1307–1330
Article MathSciNet MATH Google Scholar
Schölkopf B, Smola AJ (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT, Cambridge
Google Scholar
Taylor H, Banks SC, McCoy JF (1979) Deconvolution with the ℓ ₁ norm. Geophysics 44:39–52
Article Google Scholar
Teboulle M (1992) Entropic proximal mappings with applications to nonlinear programming. Math Oper Res 17:670–690
Article MathSciNet MATH Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc B 58:267–288
MathSciNet MATH Google Scholar
Vapnik V (1995) The nature of statistical learning theory. Springer, New York
MATH Google Scholar
Wang L, Gordon MD, Zhu J (2006) Regularized least absolute deviations regression and an efficient algorithm for parameter tuning. In: Proceedings of the sixth international conference on data mining (ICDM’06). IEEE Computer Society, Washington, DC, pp 690–700
Google Scholar
Wang S, Yehya N, Schadt EE, Wang H, Drake TA, Lusis AJ (2006) Genetic and genomic analysis of a fat mass trait with complex inheritance reveals marked sex specificity. PLoS Genet 2:148–159
Article Google Scholar
Weston J, Elisseeff A, Schölkopf B, Tipping M (2003) Use of the zero-norm with linear models and kernel methods. J Mach Learn Res 3:1439–1461
MATH Google Scholar
Wright MH (2005) The interior-point revolution in optimization: history, recent developments, and lasting consequences. Bull Am Math Soc 42:39–56
Article MATH Google Scholar
Wu TT, Lange K (2008) Coordinate descent algorithms for lasso penalized regression. Ann Appl Stat 2:224–244
Article MathSciNet MATH Google Scholar
Wu TT, Lange K (2010) Multicategory vertex discriminant analysis for high-dimensional data. Ann Appl Stat 4:1698–1721
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Biomathematics, Human Genetics, Statistics, University of California, Los Angeles, CA, USA
Kenneth Lange

Authors

Kenneth Lange
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lange, K. (2013). Penalty and Barrier Methods. In: Optimization. Springer Texts in Statistics, vol 95. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-5838-8_13

Download citation

DOI: https://doi.org/10.1007/978-1-4614-5838-8_13
Published: 21 October 2012
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-5837-1
Online ISBN: 978-1-4614-5838-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics