Abstract
Penalties and barriers feature prominently in two areas of modern optimization theory. First, both devices are employed to solve constrained optimization problems [96, 183, 226]. The general idea is to replace hard constraints by penalties or barriers and then exploit the well-oiled machinery for solving unconstrained problems. Penalty methods operate on the exterior of the feasible region and barrier methods on the interior. The strength of a penalty or barrier is determined by a tuning constant. In classical penalty methods, a single global tuning constant is gradually sent to ∞; in barrier methods, it is gradually sent to 0. Nothing prevents one from assigning different tuning constants to different penalties or barriers in the same problem. Either strategy generates a sequence of solutions that converges in practice to the solution of the original constrained optimization problem.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Armstrong RD, Kung MT (1978) Algorithm AS 132: least absolute value estimates for a simple linear regression problem. Appl Stat 27:363–366
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge
Bregman LM (1965) The method of successive projection for finding a common point of convex sets. Sov Math Dokl 6:688–692
Brinkhuis J, Tikhomirov V (2005) Optimization: insights and applications. Princeton University Press, Princeton
Candés EJ, Tao T (2007) The Danzig selector: statistical estimation when p is much larger than n. Ann Stat 35:2313–2351
Candés EJ, Wakin M, Boyd S (2007) Enhancing sparsity by reweighted ℓ 1 minimization. J Fourier Anal Appl 14:877–905
Censor Y, Zenios SA (1992) Proximal minimization with D-functions. J Optim Theor Appl 73:451–464
Chen J, Tan X (2009) Inference for multivariate normal mixtures. J Multivariate Anal 100:1367–1383
Chen SS, Donoho DL, Saunders MA (1998) Atomic decomposition by basis pursuit. SIAM J Sci Comput 20:33–61
Claerbout J, Muir F (1973) Robust modeling with erratic data. Geophysics 38:826–844
Daubechies I, Defrise M, De Mol C (2004) An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Comm Pure Appl Math 57:1413–1457
de Leeuw J, Lange K (2009) Sharp quadratic majorization in one dimension. Comput Stat Data Anal 53:2471–2484
Donoho D, Johnstone I (1994) Ideal spatial adaptation by wavelet shrinkage. Biometrika 81:425–455
Edgeworth FY (1887) On observations relating to several quantities. Hermathena 6:279–285
Edgeworth FY (1888) On a new method of reducing observations relating to several quantities. Phil Mag 25:184–191
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32:407–499
Fang S-C, Puthenpura S (1993) Linear optimization and extensions: theory and algorithms. Prentice-Hall, Englewood Cliffs
Fazel M, Hindi M, Boyd S (2003) Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices. Proc Am Contr Conf 3:2156–2162
Forsgren A, Gill PE, Wright MH (2002) Interior point methods for nonlinear optimization. SIAM Rev 44:523–597
Friedman J, Hastie T, Tibshirani R (2007) Pathwise coordinate optimization. Ann Appl Stat 1:302–332
Friedman J, Hastie T, Tibshirani R (2009) Regularized paths for generalized linear models via coordinate descent. Technical Report, Department of Statistics, Stanford University
Fu WJ (1998) Penalized regressions: the bridge versus the lasso. J Comput Graph Stat 7:397–416
Groenen PJF, Nalbantov G, Bioch JC (2007) Nonlinear support vector machines through iterative majorization and I-splines. In: Lenz HJ, Decker R (eds) Studies in classification, data analysis, and knowledge organization. Springer, Heidelberg, pp 149–161
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York
Hunter DR, Li R (2005) Variable selection using MM algorithms. Ann Stat 33:1617–1642
Lange K (1994) An adaptive barrier method for convex programming. Meth Appl Anal 1:392–402
Lange K, Wu T (2008) An MM algorithm for multicategory vertex discriminant analysis. J Comput Graph Stat 17:527–544
Lange K, Zhou H (2012) MM algorithms for geometric and signomial programming. Math Program, Series A, DOI 10.1007/s10107-012-0612-1
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791
Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. Adv Neural Inf Process Syst 13:556–562
Levina E, Rothman A, Zhu J (2008) Sparse estimation of large covariance matrices via a nested lasso penalty. Ann Appl Stat 2:245–263
Li Y, Arce GR (2004) A maximum likelihood approach to least absolute deviation regression. EURASIP J Appl Signal Process 2004:1762–1769
Luenberger DG (1984) Linear and nonlinear programming, 2nd edn. Addison-Wesley, Reading
McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions, 2nd edn. Wiley, Hoboken
Nemirovski AS, Todd MJ (2008) Interior-point methods for optimization. Acta Numerica 17:191–234
Nocedal J, Wright S (2006) Numerical optimization, 2nd edn. Springer, New York
Park MY, Hastie T (2008) Penalized logistic regression for detecting gene interactions. Biostatistics 9:30–50
Pauca VP, Piper J, Plemmons RJ (2006) Nonnegative matrix factorization for spectral data analysis. Linear Algebra Appl 416:29–47
Portnoy S, Koenker R (1997) The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute-error estimators. Stat Sci 12:279–300
Ruszczyński A (2006) Nonlinear optimization. Princeton University Press, Princeton
Santosa F, Symes WW (1986) Linear inversion of band-limited reflection seimograms. SIAM J Sci Stat Comput 7:1307–1330
Schölkopf B, Smola AJ (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT, Cambridge
Taylor H, Banks SC, McCoy JF (1979) Deconvolution with the ℓ 1 norm. Geophysics 44:39–52
Teboulle M (1992) Entropic proximal mappings with applications to nonlinear programming. Math Oper Res 17:670–690
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc B 58:267–288
Vapnik V (1995) The nature of statistical learning theory. Springer, New York
Wang L, Gordon MD, Zhu J (2006) Regularized least absolute deviations regression and an efficient algorithm for parameter tuning. In: Proceedings of the sixth international conference on data mining (ICDM’06). IEEE Computer Society, Washington, DC, pp 690–700
Wang S, Yehya N, Schadt EE, Wang H, Drake TA, Lusis AJ (2006) Genetic and genomic analysis of a fat mass trait with complex inheritance reveals marked sex specificity. PLoS Genet 2:148–159
Weston J, Elisseeff A, Schölkopf B, Tipping M (2003) Use of the zero-norm with linear models and kernel methods. J Mach Learn Res 3:1439–1461
Wright MH (2005) The interior-point revolution in optimization: history, recent developments, and lasting consequences. Bull Am Math Soc 42:39–56
Wu TT, Lange K (2008) Coordinate descent algorithms for lasso penalized regression. Ann Appl Stat 2:224–244
Wu TT, Lange K (2010) Multicategory vertex discriminant analysis for high-dimensional data. Ann Appl Stat 4:1698–1721
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this chapter
Cite this chapter
Lange, K. (2013). Penalty and Barrier Methods. In: Optimization. Springer Texts in Statistics, vol 95. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-5838-8_13
Download citation
DOI: https://doi.org/10.1007/978-1-4614-5838-8_13
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-5837-1
Online ISBN: 978-1-4614-5838-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)