Abstract
Our final chapter on optimization provides a concrete introduction to several advanced topics. The first vignette describes classical penalty and barrier methods for constrained optimization [22, 37, 45]. Penalty methods operate on the exterior and barrier methods on the interior of the feasible region. Fortunately, it is fairly easy to prove global convergence for both methods under reasonable hypotheses.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Armstrong RD, Kung MT (1978) Algorithm AS 132: least absolute value estimates for a simple linear regression problem. Appl Stat 27:363-366
Boyle JP, Dykstra RL (1985) A method for finding projections onto the intersection of convex sets in Hilbert space. In Advances in Order Restricted Statistical Inference, Lecture Notes in Statistics, Springer, New York, 28-47
Bregman LM (1965) The method of successive projection for finding a common point of convex sets. Soviet Math Doklady 6:688-692
Candes EJ, Tao T (2007) The Danzig selector: statistical estimation when p is much larger than n. Annals Stat 35:2313-2351
Candes EJ, Wakin M, Boyd S (2007) Enhancing sparsity by reweighted â„“1 minimization. J Fourier Anal Appl 14:877-905
Censor Y, Reich S (1996) Iterations of paracontractions and firmly nonexpansive operators with applications to feasibility and optimization. Optimization 37:323-339
Censor Y, Zenios SA (1992) Proximal minimization with D-functions. J Optimization Theory Appl 73:451-464
Chen SS, Donoho DL, Saunders MA (1998) Atomic decomposition by basis pursuit. SIAM J Sci Comput 20:33-61
Claerbout J, Muir F (1973) Robust modeling with erratic data. Geophysics 38:826-844
Daubechies I, Defrise M, De Mol C (2004) An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Comm Pure Appl Math 57:1413-1457
de Leeuw J, Lange K (2007) Sharp quadratic majorization in one dimension.
Deutsch F (2001) Best Approximation in Inner Product Spaces. Springer, New York
Donoho D, Johnstone I (1994) Ideal spatial adaptation by wavelet shrinkage. Biometrika 81:425-455
Dykstra RL (1983) An algorithm for restricted least squares estimation. J Amer Stat Assoc 78:837-842
Edgeworth FY (1887) On observations relating to several quantities. Hermathena 6:279-285
Edgeworth FY (1888) On a new method of reducing observations relating to several quantities. Philosophical Magazine 25:184-191
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Annals Stat 32:407-499
Elsner L, Koltracht L, Neumann M (1992) Convergence of sequential and asynchronous nonlinear paracontractions. Numerische Mathematik 62:305-319
Fang S-C, Puthenpura S (1993) Linear Optimization and Extensions: Theory and Algorithms. Prentice-Hall, Englewood Cliffs, NJ
Fazel M, Hindi M, Boyd S (2003) Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices. Proceedings American Control Conference 3:2156-2162
Ferguson TS (1996) A Course in Large Sample Theory. Chapman & Hall, London
Forsgren A, Gill PE, Wright MH (2002) Interior point methods for nonlinear optimization. SIAM Review 44:523-597
Friedman J, Hastie T, Tibshirani R (2007) Pathwise coordinate optimization. Ann Appl Stat 1:302-332
Friedman J, Hastie T, Tibshirani R (2009) Regularized paths for generalized linear models via coordinate descent. Technical Report, Stanford University Department of Statistics
Fu WJ (1998) Penalized regressions: the bridge versus the lasso. J Comp Graph Stat 7:397-416
Groenen PJF, Nalbantov G, Bioch JC (2007) Nonlinear support vector machines through iterative majorization and I-splines. Studies in Classification, Data Analysis, and Knowledge Organization, Lenz HJ, Decker R, Springer, Heidelberg-Berlin, pp 149-161
Hestenes MR (1981) Optimization Theory: The Finite Dimensional Case. Robert E Krieger Publishing, Huntington, NY
Hunter DR, Lange K (2004) A tutorial on MM algorithms. Amer Statistician 58:30-37
Hunter DR, Li R (2005) Variable selection using MM algorithms. Annals Stat 33:1617-1642
Lange K (1994) An adaptive barrier method for convex programming. Methods Applications Analysis 1:392-402
Lange, K (2004) Optimization. Springer, New York
Lange K, Wu T (2007) An MM algorithm for multicategory vertex discriminant analysis. J Computational Graphical Stat 17:527-544
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788-791
Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. Advances in Neural Information Processing Systems 13:556-562
Levina E, Rothman A, Zhu J (2008) Sparse estimation of large covariance matrices via a nested lasso penalty. Ann Appl Stat 2:245-263
Li Y, Arce GR (2004) A maximum likelihood approach to least absolute deviation regression. EURASIP J Applied Signal Proc 2004:1762-1769
Luenberger DG (1984) Linear and Nonlinear Programming, 2nd ed. Addison-Wesley, Reading, MA
Meng X-L, Rubin DB (1991) Using EM to obtain asymptotic variance-covariance matrices: the SEM algorithm, J Amer Stat Assoc 86: 899-909
Michelot C (1986) A finite algorithm for finding the projection of a point onto the canonical simplex in Rn. J Optimization Theory Applications 50:195-200
Park MY, Hastie T (2008) Penalized logistic regression for detecting gene interactions. Biostatistics 9:30-50
Pauca VP, Piper J, Plemmons RJ (2006) Nonnegative matrix factorization for spectral data analysis. Linear Algebra Applications 416:29-47
Portnoy S, Koenker R (1997) The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute-error estimators. Stat Sci 12:279-300
Santosa F, Symes WW (1986) Linear inversion of band-limited reflection seimograms. SIAM J Sci Stat Comput 7:1307-1330
Silvey SD (1975) Statistical Inference. Chapman & Hall, London
Ruszczynski A (2006) Nonlinear Optimization. Princeton University Press, Princeton, NJ
Schölkopf B, Smola AJ (2002) Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA
Taylor H, Banks SC, McCoy JF (1979) Deconvolution with the â„“1 norm. Geophysics 44:39-52
Teboulle M (1992) Entropic proximal mappings with applications to nonlinear programming. Math Operations Research 17:670-690
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc, Series B 58:267-288
Vapnik V (1995) The Nature of Statistical Learning Theory. Springer, New York
Wang L, Gordon MD, Zhu J (2006) Regularized least absolute deviations regression and an efficient algorithm for parameter tuning. Proceedings of the Sixth International Conference on Data Mining (ICDM’06). IEEE Computer Society, pp 690-700
Wang S, Yehya N, Schadt EE, Wang H, Drake TA, Lusis AJ (2006) Genetic and genomic analysis of a fat mass trait with complex inheritance reveals marked sex specificity. PLoS Genet 2:148-159
Weston J, Elisseeff A, Schölkopf B, Tipping M (2003) Use of the zeronorm with linear models and kernel methods. J Machine Learning Research 3:1439-1461
Wu TT, Lange K (2008) Coordinate descent algorithms for lasso penalized regression. Ann Appl Stat 2:224-244
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2010 Springer New York
About this chapter
Cite this chapter
Lange, K. (2010). Advanced Optimization Topics. In: Numerical Analysis for Statisticians. Statistics and Computing. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-5945-4_16
Download citation
DOI: https://doi.org/10.1007/978-1-4419-5945-4_16
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-5944-7
Online ISBN: 978-1-4419-5945-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)