Abstract
This chapter provides a concrete introduction to several advanced topics in optimization theory. Specifying an interior feasible point is the first issue that must be faced in applying a barrier method. Given an exterior point, Dykstra’s algorithm [21, 70, 79] finds the closest point in the intersection \(\cap _{i=0}^{r-1}C_{i}\) of a finite number of closed convex sets. If C i is defined by the convex constraint \(h_{i}(\boldsymbol{x}) \leq 0\), then one obvious tactic for finding an interior point is to replace C i by the set \(C_{i}(\epsilon ) =\{ \boldsymbol{x} : h_{j}(\boldsymbol{x}) \leq -\epsilon \}\) for some small ε > 0. Projecting onto the intersection of the C i (ε) then produces an interior point.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Acosta E, Delgado C (1994) Fréchet versus Carathéodory. Am Math Mon 101:332–338
Acton FS (1990) Numerical methods that work. Mathematical Association of America, Washington, DC
Anderson TW (2003) An introduction to multivariate statistical analysis, 3rd edn. Wiley, Hoboken
Armstrong RD, Kung MT (1978) Algorithm AS 132: least absolute value estimates for a simple linear regression problem. Appl Stat 27:363–366
Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: 2007 symposium on discrete algorithms (SODA). Society for Industrial and Applied Mathematics, Philadelphia, 2007
Barlow RE, Bartholomew DJ, Bremner JM, Brunk HD (1972) Statistical inference under order restrictions; the theory and application of isotonic regression. Wiley, New York
Bartle RG (1996) Return to the Riemann integral. Am Math Mon 103:625–632
Baum LE (1972) An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes. Inequalities 3:1–8
Bauschke HH, Lewis AS (2000) Dykstra’s algorithm with Bregman projections: a convergence proof. Optimization 48:409–427
Beltrami EJ (1970) An algorithmic approach to nonlinear analysis and optimization. Academic, New York
Berry MW, Drmac Z, Jessup ER (1999) Matrices, vector spaces, and information retrieval. SIAM Rev 41:335–362
Bertsekas DP (1999) Nonlinear programming, 2nd edn. Athena Scientific, Belmont
Bertsekas DP (2009) Convex optimization theory. Athena Scientific, Belmont
Bishop YMM, Feinberg SE, Holland PW (1975) Discrete multivariate analysis: theory and practice. MIT, Cambridge
Bliss GA (1925) Calculus of variations. Mathematical Society of America, Washington, DC
Böhning D, Lindsay BG (1988) Monotonicity of quadratic approximation algorithms. Ann Inst Stat Math 40:641–663
Borwein JM, Lewis AS (2000) Convex analysis and nonlinear optimization: theory and examples. Springer, New York
Botsko MW, Gosser RA (1985) On the differentiability of functions of several variables. Am Math Mon 92:663–665
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge
Boyd S, Kim SJ, Vandenberghe L, Hassibi A (2007) A tutorial on geometric programming. Optim Eng 8:67–127
Boyle JP, Dykstra RL (1985) A method for finding projections onto the intersection of convex sets in Hilbert space. In: Advances in order restricted statistical inference. Lecture notes in statistics. Springer, New York, pp 28–47
Bradley EL (1973) The equivalence of maximum likelihood and weighted least squares estimates in the exponential family. J Am Stat Assoc 68:199–200
Bradley RA, Terry ME (1952) Rank analysis of incomplete block designs. Biometrika 39:324–345
Bregman LM (1965) The method of successive projection for finding a common point of convex sets. Sov Math Dokl 6:688–692
Bregman LM (1967) The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Comput Math Math Phy 7:200–217
Bregman LM, Censor Y, Reich S (2000) Dykstra’s algorithm as the nonlinear extension of Bregman’s optimization method. J Convex Anal 6:319–333
Brent RP (1973) Some efficient algorithms for solving systems of nonlinear equations. SIAM J Numer Anal 10:327–344
Brezhneva OA, Tret’yakov AA, Wright SE (2010) A simple and elementary proof of the Karush-Kuhn-Tucker theorem for inequality-constrained optimization. Optim Lett 3:7–10
Bridger M, Stolzenberg G (1999) Uniform calculus and the law of bounded change. Am Math Mon 106:628–635
Brinkhuis J, Tikhomirov V (2005) Optimization: insights and applications. Princeton University Press, Princeton
Brophy JF, Smith PW (1988) Prototyping Karmarkar’s algorithm using MATH-PROTRAN. IMSL Dir 5:2–3
Broyden CG (1965) A class of methods for solving nonlinear simultaneous equations. Math Comput 19:577–593
Byrd RH, Nocedal J (1989) A tool for the analysis of quasi-Newton methods with application to unconstrained minimization. SIAM J Numer Anal 26:727–739
Byrne CL (2009) A first course in optimization. Department of Mathematical Sciences, University of Massachusetts Lowell, Lowell
Cai J-F, Candés EJ, Shen Z (2008) A singular value thresholding algorithm for matrix completion. SIAM J Optim 20:1956–1982
Candés EJ, Tao T (2007) The Danzig selector: statistical estimation when p is much larger than n. Ann Stat 35:2313–2351
Candés EJ, Tao T (2009) The power of convex relaxation: near-optimal matrix completion. IEEE Trans Inform Theor 56:2053–2080
Candés EJ, Romberg J, Tao T (2006) Stable signal recovery from incomplete and inaccurate measurements. Comm Pure Appl Math 59:1207–1223
Candés EJ, Wakin M, Boyd S (2007) Enhancing sparsity by reweighted ℓ 1 minimization. J Fourier Anal Appl 14:877–905
Carathéodory C (1954) Theory of functions of a complex variable, vol 1. Chelsea, New York
Censor Y, Zenios SA (1992) Proximal minimization with D-functions. J Optim Theor Appl 73:451–464
Censor Y, Chen W, Combettes PL, Davidi R, Herman GT (2012) On the effectiveness of projection methods for convex feasibility problems with linear inequality constraints. Comput Optim Appl 51:1065–1088
Charnes A, Frome EL, Yu PL (1976) The equivalence of generalized least squares and maximum likelihood in the exponential family. J Am Stat Assoc 71:169–171
Chen J, Tan X (2009) Inference for multivariate normal mixtures. J Multivariate Anal 100:1367–1383
Chen SS, Donoho DL, Saunders MA (1998) Atomic decomposition by basis pursuit. SIAM J Sci Comput 20:33–61
Cheney W (2001) Analysis for applied mathematics. Springer, New York
Choi SC, Wette R (1969) Maximum likelihood estimation of the parameters of the gamma distribution and their bias. Technometrics 11:683–690
Ciarlet PG (1989) Introduction to numerical linear algebra and optimization. Cambridge University Press, Cambridge
Claerbout J, Muir F (1973) Robust modeling with erratic data. Geophysics 38:826–844
Clarke CA, Price Evans DA, McConnell RB, Sheppard PM (1959) Secretion of blood group antigens and peptic ulcers. Br Med J 1:603–607
Conn AR, Gould NIM, Toint PL (1991) Convergence of quasi-Newton matrices generated by the symmetric rank one update. Math Program 50:177–195
Conte SD, deBoor C (1972) Elementary numerical analysis. McGraw- Hill, New York
Cox DR (1970) Analysis of binary data. Methuen, London
Danskin JM (1966) The theory of max-min, with applications. SIAM J Appl Math 14:641–664
Daubechies I, Defrise M, De Mol C (2004) An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Comm Pure Appl Math 57:1413–1457
Davidon WC (1959) Variable metric methods for minimization. AEC Research and Development Report ANL–5990, Argonne National Laboratory, Argonne
Davis JA, Smith TW (2008) General social surveys, 1972–2008 [machine-readable data le]. Roper Center for Public Opinion Research, University of Connecticut, Storrs
Debreu G (1952) Definite and semidefinite quadratic forms. Econometrica 20:295–300
de Leeuw J (1994) Block relaxation algorithms in statistics. In: Bock HH, Lenski W, Richter MM (eds) Information systems and data analysis. Springer, New York, pp 308–325
de Leeuw J (2006) Some majorization techniques. Preprint series, UCLA Department of Statistics.
de Leeuw J, Heiser WJ (1980) Multidimensional scaling with restrictions on the configuration. In: Krishnaiah PR (ed) Multivariate analysis, vol V. North-Holland, Amsterdam, pp 501–522
de Leeuw J, Lange K (2009) Sharp quadratic majorization in one dimension. Comput Stat Data Anal 53:2471–2484
Delfour MC (2012) Introduction to optimization and semidifferential calculus. SIAM, Philadelphia
Demmel J (1997) Applied numerical linear algebra. SIAM, Philadelphia
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J Roy Stat Soc B 39:1–38
Dennis JE Jr, Schnabel RB (1996) Numerical methods for unconstrained optimization and nonlinear equations. SIAM, Philadelphia
De Pierro AR (1993) On the relation between the ISRA and EM algorithm for positron emission tomography. IEEE Trans Med Imag 12:328–333
DePree JD, Swartz CW (1988) Introduction to real analysis. Wiley, Hoboken
de Souza PN, Silva J-N (2001) Berkeley problems in mathematics, 2nd edn. Springer, New York
Deutsch F (2001) Best approximation in inner product spaces. Springer, New York
Devijver PA (1985) Baum’s forward-backward algorithm revisited. Pattern Recogn Lett 3:369–373
Ding C, Li T, Jordan MI (2010) Convex and semi-nonnegative matrix factorizations. IEEE Trans Pattern Anal Mach Intell 32:45–55
Dobson AJ (1990) An introduction to generalized linear models. Chapman & Hall, London
Donoho DL (2006) Compressed sensing. IEEE Trans Inform Theor 52:1289–1306
Donoho D, Johnstone I (1994) Ideal spatial adaptation by wavelet shrinkage. Biometrika 81:425–455
Duan J-C, Simonato J-G (1993) Multiplicity of solutions in maximum likelihood factor analysis. J Stat Comput Simul 47:37–47
Duchi J, Shalev-Shwartz S, Singer Y, Chandra T (2008) Efficient projections onto the l 1-ball for learning in high dimensions. In: Proceedings of the 25th international conference on machine learning (ICML 2008). ACM, New York, pp 272–279
Durbin R, Eddy S, Krogh A, Mitchison G (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge
Dykstra RL (1983) An algorithm for restricted least squares estimation. J Am Stat Assoc 78:837–842
Edgeworth FY (1887) On observations relating to several quantities. Hermathena 6:279–285
Edgeworth FY (1888) On a new method of reducing observations relating to several quantities. Phil Mag 25:184–191
Edwards CH Jr (1973) Advanced calculus of several variables. Academic, New York
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32:407–499
Ekeland I (1974) On the variational principle. J Math Anal Appl 47:324–353
Elsner L, Koltracht L, Neumann M (1992) Convergence of sequential and asynchronous nonlinear paracontractions. Numer Math 62:305–319
Everitt BS (1977) The analysis of contingency tables. Chapman & Hall, London
Fang S-C, Puthenpura S (1993) Linear optimization and extensions: theory and algorithms. Prentice-Hall, Englewood Cliffs
Fazel M, Hindi M, Boyd S (2003) Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices. Proc Am Contr Conf 3:2156–2162
Feller W (1971) An introduction to probability theory and its applications, vol 2, 2nd edn. Wiley, Hoboken
Fessler JA, Clinthorne NH, Rogers WL (1993) On complete-data spaces for PET reconstruction algorithms. IEEE Trans Nucl Sci 40:1055–1061
Fiacco AV, McCormick GP (1968) Nonlinear programming: sequential unconstrained minimization techniques. Wiley, Hoboken
Fletcher R (2000) Practical methods of optimization, 2nd edn. Wiley, Hoboken
Fletcher R, Powell MJD (1963) A rapidly convergent descent method for minimization. Comput J 6:163–168
Fletcher R, Reeves CM (1964) Function minimization by conjugate gradients. Comput J 7:149–154
Flury B, Zoppè A (2000) Exercises in EM. Am Stat 54:207–209
Forsgren A, Gill PE, Wright MH (2002) Interior point methods for nonlinear optimization. SIAM Rev 44:523–597
Franklin J (1983) Mathematical methods of economics. Am Math Mon 90:229–244
Friedman J, Hastie T, Tibshirani R (2007) Pathwise coordinate optimization. Ann Appl Stat 1:302–332
Friedman J, Hastie T, Tibshirani R (2009) Regularized paths for generalized linear models via coordinate descent. Technical Report, Department of Statistics, Stanford University
Fu WJ (1998) Penalized regressions: the bridge versus the lasso. J Comput Graph Stat 7:397–416
Gabriel KR, Zamir S (1979) Lower rank approximation of matrices by least squares with any choice of weights. Technometrics 21:489–498
Gelfand IM, Fomin SV (1963) Calculus of variations. Prentice-Hall, Englewood Cliffs
Geman S, McClure D (1985) Bayesian image analysis: an application to single photon emission tomography. In: Proceedings of the statistical computing section. American Statistical Association, Washington, DC, pp 12–18
Gifi A (1990) Nonlinear multivariate analysis. Wiley, Hoboken
Gill PE, Murray W, Wright MH (1991) Numerical linear algebra and optimization, vol 1. Addison-Wesley, Redwood City
Goldstein T, Osher S (2009) The split Bregman method for ℓ 1-regularized problems. SIAM J Imag Sci 2:323–343
Golub GH, Van Loan CF (1996) Matrix computations, 3rd edn. Johns Hopkins University Press, Baltimore
Gordon RA (1998) The use of tagged partitions in elementary real analysis. Am Math Mon 105:107–117
Gould NIM (2008) How good are projection methods for convex feasibility problems? Comput Optim Appl 40:1–12
Green PJ (1984) Iteratively reweighted least squares for maximum likelihood estimation and some robust and resistant alternatives (with discussion). J Roy Stat Soc B 46:149–192
Green PJ (1990) Bayesian reconstruction for emission tomography data using a modified EM algorithm. IEEE Trans Med Imag 9:84–94
Green PJ (1990) On use of the EM algorithm for penalized likelihood estimation. J Roy Stat Soc B 52:443–452
Grimmett GR, Stirzaker DR (1992) Probability and random processes, 2nd edn. Oxford University Press, Oxford
Groenen PJF, Nalbantov G, Bioch JC (2007) Nonlinear support vector machines through iterative majorization and I-splines. In: Lenz HJ, Decker R (eds) Studies in classification, data analysis, and knowledge organization. Springer, Heidelberg, pp 149–161
Guillemin V, Pollack A (1974) Differential topology. Prentice-Hall, Englewood Cliffs
Güler O (2010) Foundations of optimization. Springer, New York
Hämmerlin G, Hoffmann K-H (1991) Numerical mathematics. Springer, New York
Hardy GH, Littlewood JE, Pólya G (1952) Inequalities, 2nd edn. Cambridge University Press, Cambridge
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York
He L, Marquina A, Osher S (2005) Blind deconvolution using TV regularization and Bregman iteration. Int J Imag Syst Technol 15, 74–83
Heiser WJ (1987) Correspondence analysis with least absolute residuals. Comput Stat Data Anal 5:337–356
Heiser WJ (1995) Convergent computing by iterative majorization: theory and applications in multidimensional data analysis. In: Krzanowski WJ (ed) Recent advances in descriptive multivariate analysis. Clarendon, Oxford, pp 157–189
Henrici P (1982) Essentials of numerical analysis with pocket calculator demonstrations. Wiley, Hoboken
Herman GT (1980) Image reconstruction from projections: the fundamentals of computerized tomography. Springer, New York
Hestenes MR (1981) Optimization theory: the finite dimensional case. Robert E Krieger Publishing, Huntington
Hestenes MR, Karush WE (1951) A method of gradients for the calculation of the characteristic roots and vectors of a real symmetric matrix. J Res Natl Bur Stand 47:471–478
Hestenes MR, Stiefel E (1952) Methods of conjugate gradients for solving linear systems. J Res Natl Bur Stand 29:409–439
Higham NJ (2008) Functions of matrices: theory and computation. SIAM, Philadelphia
Hille E (1959) Analytic function theory, vol 1. Blaisdell, New York
Hiriart-Urruty J-B (1986) When is a point x satisfying ∇ f(x) = 0 a global minimum of f(x)? Am Math Mon 93:556–558
Hiriart-Urruty J-B, Claude Lemaréchal C (2001) Fundamentals of convex analysis. Springer, New York
Hochstadt H (1986) The functions of mathematical physics. Dover, New York
Hoel PG, Port SC, Stone CJ (1971) Introduction to probability theory. Houghton Mifflin, Boston
Hoffman K (1975) Analysis in Euclidean space. Prentice-Hall, Englewood Cliffs
Hoffman K, Kunze R (1971) Linear algebra, 2nd edn. Prentice-Hall, Englewood Cliffs
Horn RA, Johnson CR (1985) Matrix analysis. Cambridge University Press, Cambridge
Horn RA, Johnson CR (1991) Topics in matrix analysis. Cambridge University Press, Cambridge
Householder AS (1975) The theory of matrices in numerical analysis. Dover, New York
Hrusa W, Troutman JL (1981) Elementary characterization of classical minima. Am Math Mon 88:321–327
Hunter DR (2004) MM algorithms for generalized Bradley-Terry models. Ann Stat 32:386–408
Hunter DR, Lange K (2000) Quantile regression via an MM algorithm. J Comput Graph Stat 9:60–77
Hunter DR, Lange K (2004) A tutorial on MM algorithms. Am Stat 58:30–37
Hunter DR, Li R (2005) Variable selection using MM algorithms. Ann Stat 33:1617–1642
Jamshidian M, Jennrich RI (1995) Acceleration of the EM algorithm by using quasi-Newton methods. J Roy Stat Soc B 59:569–587
Jamshidian M, Jennrich RI (1997) Quasi-Newton acceleration of the EM algorithm. J Roy Stat Soc B 59:569–587
Jennrich RI, Moore RH (1975) Maximum likelihood estimation by means of nonlinear least squares. In: Proceedings of the statistical computing section. American Statistical Association, Washington, DC, pp 57–65
Jia R-Q, Zhao H, Zhao W (2009) Convergence analysis of the Bregman method for the variational model of image denoising. Appl Comput Harmon Anal 27:367–379
Karlin S, Taylor HM (1975) A first course in stochastic processes, 2nd edn. Academic, New York
Karush W (1939) Minima of functions of several variables with inequalities as side conditions. Master’s Thesis, Department of Mathematics, University of Chicago, Chicago
Keener JP (1993) The Perron-Frobenius theorem and the ranking of football teams. SIAM Rev 35:80–93
Kelley CT (1999) Iterative methods for optimization. SIAM, Philadelphia
Khalfan HF, Byrd RH, Schnabel RB (1993) A theoretical and experimental study of the symmetric rank-one update. SIAM J Optim 3:1–24
Kiers HAL (1997) Weighted least squares fitting using ordinary least squares algorithms. Psychometrika 62:251–266
Kingman JFC (1993) Poisson processes. Oxford University Press, Oxford
Komiya H (1988) Elementary proof for Sion’s minimax theorem. Kodai Math J 11:5–7
Kosowsky JJ, Yuille AL (1994) The invisible hand algorithm: solving the assignment problem with statistical physics. Neural Network 7:477–490
Kruskal JB (1964) Nonmetric multidimensional scaling: a numerical method. Psychometrika 29:115–129
Kruskal JB (1965) Analysis of factorial experiments by estimating monotone transformations of the data. J Roy Stat Soc B 27:251–263
Ku HH, Kullback S (1974) Log-linear models in contingency table analysis. Biometrics 10:452–458
Kuhn S (1991) The derivative á la Carathéodory. Am Math Mon 98:40–44
Kuhn HW, Tucker AW (1951) Nonlinear programming. In: Proceedings of the second Berkeley symposium on mathematical statistics and probability. University of California Press, Berkeley
Lange K (1994) An adaptive barrier method for convex programming. Meth Appl Anal 1:392–402
Lange K (1995) A gradient algorithm locally equivalent to the EM algorithm. J Roy Stat Soc B 57:425–437
Lange K (1995) A quasi-Newton acceleration of the EM algorithm. Stat Sin 5:1–18
Lange K (2002) Mathematical and statistical methods for genetic analysis, 2nd edn. Springer, New York
Lange K (2010) Numerical analysis for statisticians, 2nd edn. Springer, New York
Lange K, Carson R (1984) EM reconstruction algorithms for emission and transmission tomography. J Comput Assist Tomogr 8:306–316
Lange K, Fessler JA (1995) Globally convergent algorithms for maximum a posteriori transmission tomography. IEEE Trans Image Process 4:1430–1438
Lange K, Wu T (2008) An MM algorithm for multicategory vertex discriminant analysis. J Comput Graph Stat 17:527–544
Lange K, Zhou H (2012) MM algorithms for geometric and signomial programming. Math Program, Series A, DOI 10.1007/s10107-012-0612-1
Lange K, Hunter D, Yang I (2000) Optimization transfer using surrogate objective functions (with discussion). J Comput Graph Stat 9:1–59
Lax PD (2007) Linear algebra and its applications, 2nd edn. Wiley, Hoboken
Ledoita O, Wolf M (2004) A well-conditioned estimator for large-dimensional covariance matrices. J Multivar Anal 88:365–411
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791
Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. Adv Neural Inf Process Syst 13:556–562
Lehmann EL (1986) Testing statistical hypotheses, 2nd edn. Wiley, Hoboken
Levina E, Rothman A, Zhu J (2008) Sparse estimation of large covariance matrices via a nested lasso penalty. Ann Appl Stat 2:245–263
Li Y, Arce GR (2004) A maximum likelihood approach to least absolute deviation regression. EURASIP J Appl Signal Process 2004:1762–1769
Little RJA, Rubin DB (1987) Statistical analysis with missing data. Wiley, Hoboken
Louis TA (1982) Finding the observed information matrix when using the EM algorithm. J Roy Stat Soc B 44:226–233
Luce RD (1959) Individual choice behavior: a theoretical analysis. Wiley, Hoboken
Luce RD (1977) The choice axiom after twenty years. J Math Psychol 15:215–233
Luenberger DG (1984) Linear and nonlinear programming, 2nd edn. Addison-Wesley, Reading
Magnus JR, Neudecker H (1988) Matrix differential calculus with applications in statistics and econometrics. Wiley, Hoboken
Maher MJ (1982) Modelling association football scores. Stat Neerl 36:109–118
Mangasarian OL, Fromovitz S (1967) The Fritz John necessary optimality conditions in the presence of equality and inequality constraints. J Math Anal Appl 17:37–47
Mardia KV, Kent JT, Bibby JM (1979) Multivariate analysis. Academic, New York
Marsden JE, Hoffman MJ (1993) Elementary classical analysis, 2nd edn. W H Freeman & Co, New York
Mazumder R, Hastie T, Tibshirani R (2010) Spectral regularization algorithms for learning large incomplete matrices. J Mach Learn Res 11:2287–2322
McLachlan GJ, Do K-A, Ambroise C (2004) Analyzing microarray gene expression data. Wiley, Hoboken
McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions, 2nd edn. Wiley, Hoboken
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, Hoboken
McLeod RM (1980) The generalized Riemann integral. Mathematical Association of America, Washington, DC
McShane EJ (1973) The Lagrange multiplier rule. Am Math Mon 80:922–925
Meyer RR (1976) Sufficient conditions for the convergence of monotonic mathematical programming algorithms. J Comput Syst Sci 12:108–121
Michelot C (1986) A finite algorithm for finding the projection of a point onto the canonical simplex in R n. J Optim Theor Appl 50:195–200
Miller KS (1987) Some eclectic matrix theory. Robert E Krieger Publishing, Malabar
Moré JJ, Sorensen DC (1983) Computing a trust region step. SIAM J Sci Stat Comput 4:553–572
Narayanan A (1991) Algorithm AS 266: maximum likelihood estimation of the parameters of the Dirichlet distribution. Appl Stat 40:365–374
Nazareth L (1979) A relationship between the BFGS and conjugate gradient algorithms and its implications for new algorithms. SIAM J Numer Anal 16:794–800
Nedelman J, Wallenuis T (1986) Bernoulli trials, Poisson trials, surprising variances, and Jensen’s inequality. Am Stat 40:286–289
Nelder JA, Wedderburn RWM (1972) Generalized linear models. J Roy Stat Soc A 135:370–384
Nemirovski AS, Todd MJ (2008) Interior-point methods for optimization. Acta Numerica 17:191–234
Nocedal J (1991) Theory of algorithms for unconstrained optimization. Acta Numerica 1991:199–242
Nocedal J, Wright S (2006) Numerical optimization, 2nd edn. Springer, New York
Orchard T, Woodbury MA (1972) A missing information principle: theory and applications. In: Proceedings of the 6th Berkeley symposium on mathematical statistics and probability. University of California Press, Berkeley, pp 697–715
Ortega JM (1990) Numerical analysis: a second course. Society for Industrial and Applied Mathematics, Philadelphia
Osher S, Burger M, Goldfarb D, Xu J, Yin W (2005) An iterative regularization method for total variation based image restoration. Multiscale Model Simul 4:460–489
Osher S, Mao T, Dong B, Yin W (2011) Fast linearized Bregman iteration for compressive sensing and sparse denoising. Comm Math Sci 8:93–111
Park MY, Hastie T (2008) Penalized logistic regression for detecting gene interactions. Biostatistics 9:30–50
Pauca VP, Piper J, Plemmons RJ (2006) Nonnegative matrix factorization for spectral data analysis. Linear Algebra Appl 416:29–47
Peressini AL, Sullivan FE, Uhl JJ Jr (1988) The mathematics of nonlinear programming. Springer, New York
Polya G (1954) Induction and analogy in mathematics. Volume I of mathematics and plausible reasoning. Princeton University Press, Princeton
Portnoy S, Koenker R (1997) The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute-error estimators. Stat Sci 12:279–300
Press WH, Teukolsky SA, Vetterling WT, Flannery BP (1992) Numerical recipes in Fortran: the art of scientific computing, 2nd edn. Cambridge University Press, Cambridge
Rabiner L (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77:257–285
Ranola JM, Ahn S, Sehl ME, Smith DJ, Lange K (2010) A Poisson model for random multigraphs. Bioinformatics 26:2004–2011
Rao CR (1973) Linear statistical inference and its applications, 2nd edn. Wiley, Hoboken
Robertson T, Wright FT, Dykstra RL (1988) Order restricted statistical inference. Wiley, Hoboken
Romano G (1995) New results in subdifferential calculus with applications to convex analysis. Appl Math Optim 32:213–234
Rockafellar RT (1996) Convex analysis. Princeton University Press, Princeton
Royden HL (1988) Real analysis, 3rd edn. Macmillan, London
Rudin W (1979) Principles of mathematical analysis, 3rd edn. McGraw-Hill, New York
Rudin LI, Osher S, Fatemi E (1992) Nonlinear total variation based noise removal algorithms. Physica D 60:259–268
Rustagi JS (1976) Variational methods in statistics. Academic, New York
Ruszczyński A (2006) Nonlinear optimization. Princeton University Press, Princeton
Sabatti C, Lange K (2002) Genomewide motif identification using a dictionary model. Proc IEEE 90:1803–1810
Sagan H (1969) Introduction to the calculus of variations. McGraw-Hill, New York
Santosa F, Symes WW (1986) Linear inversion of band-limited reflection seimograms. SIAM J Sci Stat Comput 7:1307–1330
Schmidt M, van den Berg E, Friedlander MP, Murphy K (2009) Optimizing costly functions with simple constraints: a limited-memory projected quasi-Newton algorithm. In: van Dyk D, Welling M (eds) Proceedings of The twelfth international conference on artificial intelligence and statistics (AISTATS), vol 5, pp 456–463
Schölkopf B, Smola AJ (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT, Cambridge
Seber GAF, Lee AJ (2003) Linear regression analysis, 2nd edn. Wiley, Hoboken
Segel LA (1977) Mathematics applied to continuum mechanics. Macmillan, New York
Seneta E (1973) Non-negative matrices: an introduction to theory and applications. Wiley, Hoboken
Sha F, Saul LK, Lee DD (2003) Multiplicative updates for nonnegative quadratic programming in support vector machines. In: Becker S, Thrun S, Obermayer K (eds) Advances in neural information processing systems 15. MIT, Cambridge, pp 1065–1073
Silvapulle MJ, Sen PK (2005) Constrained statistical inference. Wiley, Hoboken
Sinkhorn R (1967) Diagonal equivalence to matrices with prescribed row and column sums. Am Math Mon 74:402–405
Sion M (1958) On general minimax theorems. Pac J Math 8:171–176
Smith CAB (1957) Counting methods in genetical statistics. Ann Hum Genet 21:254–276
Smith DR (1974) Variational methods in optimization. Dover, Mineola
Sorensen DC (1997) Minimization of a large-scale quadratic function subject to spherical constraints. SIAM J Optim 7:141–161
Srebro N, Jaakkola T (2003) Weighted low-rank approximations. In: Machine learning international workshop conference 2003. AAAI Press, 20:720–727
Steele JM (2004) The Cauchy-Schwarz master class: an introduction to the art of inequalities. Cambridge University Press and the Mathematical Association of America, Cambridge
Stein EM, Shakarchi R (2003) Complex analysis. Princeton University Press, Princeton
Stern RJ, Wolkowicz H (1995) Indefinite trust region subproblems and nonsymmetric eigenvalue perturbations. SIAM J Optim 5:286–313
Stoer J, Bulirsch R (2002) Introduction to numerical analysis, 3rd edn. Springer, New York
Strang G (1986) Introduction to applied mathematics. Wellesley-Cambridge, Wellesley
Strang G (1986) The fundamental theorem of linear algebra. Am Math Mon 100:848–855
Strang G (2003) Introduction to linear algebra, 3rd edn. Wellesley-Cambridge, Wellesley
Swartz C, Thomson BS (1988) More on the fundamental theorem of calculus. Am Math Mon 95:644–648
Tanner MA (1993) Tools for statistical inference: methods for the exploration of posterior distributions and likelihood functions, 2nd edn. Springer, New York
Taylor H, Banks SC, McCoy JF (1979) Deconvolution with the ℓ 1 norm. Geophysics 44:39–52
Teboulle M (1992) Entropic proximal mappings with applications to nonlinear programming. Math Oper Res 17:670–690
Theobald CM (1975) An inequality for the trace of the product of two symmetric matrices. Math Proc Camb Phil Soc 77:265–267
Thompson HB (1989) Taylor’s theorem using the generalized Riemann integral. Am Math Mon 96:346–350
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc B 58:267–288
Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K (2005) Sparsity and smoothness via the fused lasso. J Roy Stat Soc B 67:91–108
Tikhomirov VM (1990) Stories about maxima and minima. American Mathematical Society, Providence
Titterington DM, Smith AFM, Makov UE (1985) Statistical analysis of finite mixture distributions. Wiley, Hoboken
Trefethen LN, Bau D (1997) Numerical linear algebra. SIAM, Philadelphia
Uherka DJ, Sergott AM (1977) On the continuous dependence of the roots of a polynomial on its coefficients. Am Math Mon 84:368–370
Vandenberghe L, Boyd S, Wu S (1998) Determinant maximization with linear matrix inequality constraints. SIAM J Matrix Anal Appl 19:499–533
Van Ruitenburg J (2005) Algorithms for parameter estimation in the Rasch model. Measurement and Research Department Reports 2005–4. CITO, Arnhem
Vapnik V (1995) The nature of statistical learning theory. Springer, New York
Vardi Y, Shepp LA, Kaufman L (1985) A statistical model for positron emission tomography. J Am Stat Assoc 80:8–37
Von Neumann J (1928) Zur theorie der gesellschaftsspiele. Math Ann 100:295–320
Wang L, Gordon MD, Zhu J (2006) Regularized least absolute deviations regression and an efficient algorithm for parameter tuning. In: Proceedings of the sixth international conference on data mining (ICDM’06). IEEE Computer Society, Washington, DC, pp 690–700
Wang S, Yehya N, Schadt EE, Wang H, Drake TA, Lusis AJ (2006) Genetic and genomic analysis of a fat mass trait with complex inheritance reveals marked sex specificity. PLoS Genet 2:148–159
Watson GA (1992) Characterization of the subdifferential of some matrix norms. Linear Algebra Appl 170:1039–1053
Weeks DE, Lange K (1989) Trials, tribulations, and triumphs of the EM algorithm in pedigree analysis. IMA J Math Appl Med Biol 6:209–232
Weiszfeld E (1937) On the point for which the sum of the distances to n given points is minimum. Ann Oper Res 167:741 (Translated from the French original [Tohoku Math J 43:335–386 (1937)] and annotated by Frank Plastria)
Weston J, Elisseeff A, Schölkopf B, Tipping M (2003) Use of the zero-norm with linear models and kernel methods. J Mach Learn Res 3:1439–1461
Whyte BM, Gold J, Dobson AJ, Cooper DA (1987) Epidemiology of acquired immunodeficiency syndrome in Australia. Med J Aust 147:65–69
Wright MH (2005) The interior-point revolution in optimization: history, recent developments, and lasting consequences. Bull Am Math Soc 42:39–56
Wu CF (1983) On the convergence properties of the EM algorithm. Ann Stat 11:95–103
Wu TT, Lange K (2008) Coordinate descent algorithms for lasso penalized regression. Ann Appl Stat 2:224–244
Wu TT, Lange K (2010) Multicategory vertex discriminant analysis for high-dimensional data. Ann Appl Stat 4:1698–1721
Yee PL, Vyb́orný R (2000) The integral: an easy approach after Kurzweil and Henstock. Cambridge University Press, Cambridge
Yin W, Osher S, Goldfarb D, Darbon J (2008) Bregman iterative algorithms for ℓ 1-minimization with applications to compressed sensing. SIAM J Imag Sci 1:143–168
Zhang Z, Lange K, Ophoff R, Sabatti C (2010) Reconstructing DNA copy number by penalized estimation and imputation. Ann Appl Stat 4:1749–1773
Zhou H, Lange K (2009) On the bumpy road to the dominant mode. Scand J Stat 37:612–631
Zhou H, Lange K (2010) MM algorithms for some discrete multivariate distributions. J Comput Graph Stat 19:645–665
Zhou H, Lange K (2012) A path algorithm for constrained estimation. J Comput Graph Stat DOI 10.1080/10618600.2012.681248
Zhou H, Lange K (2012) Path following in the exact penalty method of convex programming (submitted)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this chapter
Cite this chapter
Lange, K. (2013). Feasibility and Duality. In: Optimization. Springer Texts in Statistics, vol 95. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-5838-8_15
Download citation
DOI: https://doi.org/10.1007/978-1-4614-5838-8_15
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-5837-1
Online ISBN: 978-1-4614-5838-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)