Feasibility and Duality

Lange, Kenneth

doi:10.1007/978-1-4614-5838-8_15

Kenneth Lange²

Part of the book series: Springer Texts in Statistics ((STS,volume 95))

12k Accesses

Abstract

This chapter provides a concrete introduction to several advanced topics in optimization theory. Specifying an interior feasible point is the first issue that must be faced in applying a barrier method. Given an exterior point, Dykstra’s algorithm [21, 70, 79] finds the closest point in the intersection \(\cap _{i=0}^{r-1}C_{i}\) of a finite number of closed convex sets. If C _i is defined by the convex constraint \(h_{i}(\boldsymbol{x}) \leq 0\), then one obvious tactic for finding an interior point is to replace C _i by the set \(C_{i}(\epsilon ) =\{ \boldsymbol{x} : h_{j}(\boldsymbol{x}) \leq -\epsilon \}\) for some small ε > 0. Projecting onto the intersection of the C _i(ε) then produces an interior point.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Acosta E, Delgado C (1994) Fréchet versus Carathéodory. Am Math Mon 101:332–338
Article MATH Google Scholar
Acton FS (1990) Numerical methods that work. Mathematical Association of America, Washington, DC
MATH Google Scholar
Anderson TW (2003) An introduction to multivariate statistical analysis, 3rd edn. Wiley, Hoboken
MATH Google Scholar
Armstrong RD, Kung MT (1978) Algorithm AS 132: least absolute value estimates for a simple linear regression problem. Appl Stat 27:363–366
Article MATH Google Scholar
Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: 2007 symposium on discrete algorithms (SODA). Society for Industrial and Applied Mathematics, Philadelphia, 2007
Google Scholar
Barlow RE, Bartholomew DJ, Bremner JM, Brunk HD (1972) Statistical inference under order restrictions; the theory and application of isotonic regression. Wiley, New York
MATH Google Scholar
Bartle RG (1996) Return to the Riemann integral. Am Math Mon 103:625–632
Article MathSciNet MATH Google Scholar
Baum LE (1972) An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes. Inequalities 3:1–8
Google Scholar
Bauschke HH, Lewis AS (2000) Dykstra’s algorithm with Bregman projections: a convergence proof. Optimization 48:409–427
Article MathSciNet MATH Google Scholar
Beltrami EJ (1970) An algorithmic approach to nonlinear analysis and optimization. Academic, New York
MATH Google Scholar
Berry MW, Drmac Z, Jessup ER (1999) Matrices, vector spaces, and information retrieval. SIAM Rev 41:335–362
Article MathSciNet MATH Google Scholar
Bertsekas DP (1999) Nonlinear programming, 2nd edn. Athena Scientific, Belmont
MATH Google Scholar
Bertsekas DP (2009) Convex optimization theory. Athena Scientific, Belmont
MATH Google Scholar
Bishop YMM, Feinberg SE, Holland PW (1975) Discrete multivariate analysis: theory and practice. MIT, Cambridge
MATH Google Scholar
Bliss GA (1925) Calculus of variations. Mathematical Society of America, Washington, DC
MATH Google Scholar
Böhning D, Lindsay BG (1988) Monotonicity of quadratic approximation algorithms. Ann Inst Stat Math 40:641–663
Article MATH Google Scholar
Borwein JM, Lewis AS (2000) Convex analysis and nonlinear optimization: theory and examples. Springer, New York
MATH Google Scholar
Botsko MW, Gosser RA (1985) On the differentiability of functions of several variables. Am Math Mon 92:663–665
Article MathSciNet Google Scholar
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge
MATH Google Scholar
Boyd S, Kim SJ, Vandenberghe L, Hassibi A (2007) A tutorial on geometric programming. Optim Eng 8:67–127
Article MathSciNet MATH Google Scholar
Boyle JP, Dykstra RL (1985) A method for finding projections onto the intersection of convex sets in Hilbert space. In: Advances in order restricted statistical inference. Lecture notes in statistics. Springer, New York, pp 28–47
Google Scholar
Bradley EL (1973) The equivalence of maximum likelihood and weighted least squares estimates in the exponential family. J Am Stat Assoc 68:199–200
MATH Google Scholar
Bradley RA, Terry ME (1952) Rank analysis of incomplete block designs. Biometrika 39:324–345
MathSciNet MATH Google Scholar
Bregman LM (1965) The method of successive projection for finding a common point of convex sets. Sov Math Dokl 6:688–692
MATH Google Scholar
Bregman LM (1967) The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Comput Math Math Phy 7:200–217
Article Google Scholar
Bregman LM, Censor Y, Reich S (2000) Dykstra’s algorithm as the nonlinear extension of Bregman’s optimization method. J Convex Anal 6:319–333
MathSciNet Google Scholar
Brent RP (1973) Some efficient algorithms for solving systems of nonlinear equations. SIAM J Numer Anal 10:327–344
Article MathSciNet MATH Google Scholar
Brezhneva OA, Tret’yakov AA, Wright SE (2010) A simple and elementary proof of the Karush-Kuhn-Tucker theorem for inequality-constrained optimization. Optim Lett 3:7–10
Article MathSciNet Google Scholar
Bridger M, Stolzenberg G (1999) Uniform calculus and the law of bounded change. Am Math Mon 106:628–635
Article MathSciNet MATH Google Scholar
Brinkhuis J, Tikhomirov V (2005) Optimization: insights and applications. Princeton University Press, Princeton
MATH Google Scholar
Brophy JF, Smith PW (1988) Prototyping Karmarkar’s algorithm using MATH-PROTRAN. IMSL Dir 5:2–3
Google Scholar
Broyden CG (1965) A class of methods for solving nonlinear simultaneous equations. Math Comput 19:577–593
Article MathSciNet MATH Google Scholar
Byrd RH, Nocedal J (1989) A tool for the analysis of quasi-Newton methods with application to unconstrained minimization. SIAM J Numer Anal 26:727–739
Article MathSciNet MATH Google Scholar
Byrne CL (2009) A first course in optimization. Department of Mathematical Sciences, University of Massachusetts Lowell, Lowell
Google Scholar
Cai J-F, Candés EJ, Shen Z (2008) A singular value thresholding algorithm for matrix completion. SIAM J Optim 20:1956–1982
Article Google Scholar
Candés EJ, Tao T (2007) The Danzig selector: statistical estimation when p is much larger than n. Ann Stat 35:2313–2351
Article MATH Google Scholar
Candés EJ, Tao T (2009) The power of convex relaxation: near-optimal matrix completion. IEEE Trans Inform Theor 56:2053–2080
Article Google Scholar
Candés EJ, Romberg J, Tao T (2006) Stable signal recovery from incomplete and inaccurate measurements. Comm Pure Appl Math 59:1207–1223
Article MathSciNet MATH Google Scholar
Candés EJ, Wakin M, Boyd S (2007) Enhancing sparsity by reweighted ℓ ₁ minimization. J Fourier Anal Appl 14:877–905
Article Google Scholar
Carathéodory C (1954) Theory of functions of a complex variable, vol 1. Chelsea, New York
Google Scholar
Censor Y, Zenios SA (1992) Proximal minimization with D-functions. J Optim Theor Appl 73:451–464
Article MathSciNet MATH Google Scholar
Censor Y, Chen W, Combettes PL, Davidi R, Herman GT (2012) On the effectiveness of projection methods for convex feasibility problems with linear inequality constraints. Comput Optim Appl 51:1065–1088
Article MathSciNet MATH Google Scholar
Charnes A, Frome EL, Yu PL (1976) The equivalence of generalized least squares and maximum likelihood in the exponential family. J Am Stat Assoc 71:169–171
Article MathSciNet MATH Google Scholar
Chen J, Tan X (2009) Inference for multivariate normal mixtures. J Multivariate Anal 100:1367–1383
Article MathSciNet MATH Google Scholar
Chen SS, Donoho DL, Saunders MA (1998) Atomic decomposition by basis pursuit. SIAM J Sci Comput 20:33–61
Article MathSciNet Google Scholar
Cheney W (2001) Analysis for applied mathematics. Springer, New York
MATH Google Scholar
Choi SC, Wette R (1969) Maximum likelihood estimation of the parameters of the gamma distribution and their bias. Technometrics 11:683–690
Article MATH Google Scholar
Ciarlet PG (1989) Introduction to numerical linear algebra and optimization. Cambridge University Press, Cambridge
Google Scholar
Claerbout J, Muir F (1973) Robust modeling with erratic data. Geophysics 38:826–844
Article Google Scholar
Clarke CA, Price Evans DA, McConnell RB, Sheppard PM (1959) Secretion of blood group antigens and peptic ulcers. Br Med J 1:603–607
Article Google Scholar
Conn AR, Gould NIM, Toint PL (1991) Convergence of quasi-Newton matrices generated by the symmetric rank one update. Math Program 50:177–195
Article MathSciNet MATH Google Scholar
Conte SD, deBoor C (1972) Elementary numerical analysis. McGraw- Hill, New York
MATH Google Scholar
Cox DR (1970) Analysis of binary data. Methuen, London
MATH Google Scholar
Danskin JM (1966) The theory of max-min, with applications. SIAM J Appl Math 14:641–664
Article MathSciNet MATH Google Scholar
Daubechies I, Defrise M, De Mol C (2004) An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Comm Pure Appl Math 57:1413–1457
Article MathSciNet MATH Google Scholar
Davidon WC (1959) Variable metric methods for minimization. AEC Research and Development Report ANL–5990, Argonne National Laboratory, Argonne
Google Scholar
Davis JA, Smith TW (2008) General social surveys, 1972–2008 [machine-readable data le]. Roper Center for Public Opinion Research, University of Connecticut, Storrs
Google Scholar
Debreu G (1952) Definite and semidefinite quadratic forms. Econometrica 20:295–300
Article MathSciNet MATH Google Scholar
de Leeuw J (1994) Block relaxation algorithms in statistics. In: Bock HH, Lenski W, Richter MM (eds) Information systems and data analysis. Springer, New York, pp 308–325
Chapter Google Scholar
de Leeuw J (2006) Some majorization techniques. Preprint series, UCLA Department of Statistics.
Google Scholar
de Leeuw J, Heiser WJ (1980) Multidimensional scaling with restrictions on the configuration. In: Krishnaiah PR (ed) Multivariate analysis, vol V. North-Holland, Amsterdam, pp 501–522
Google Scholar
de Leeuw J, Lange K (2009) Sharp quadratic majorization in one dimension. Comput Stat Data Anal 53:2471–2484
Article MATH Google Scholar
Delfour MC (2012) Introduction to optimization and semidifferential calculus. SIAM, Philadelphia
Book MATH Google Scholar
Demmel J (1997) Applied numerical linear algebra. SIAM, Philadelphia
Book MATH Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J Roy Stat Soc B 39:1–38
MathSciNet MATH Google Scholar
Dennis JE Jr, Schnabel RB (1996) Numerical methods for unconstrained optimization and nonlinear equations. SIAM, Philadelphia
Book MATH Google Scholar
De Pierro AR (1993) On the relation between the ISRA and EM algorithm for positron emission tomography. IEEE Trans Med Imag 12:328–333
Article Google Scholar
DePree JD, Swartz CW (1988) Introduction to real analysis. Wiley, Hoboken
MATH Google Scholar
de Souza PN, Silva J-N (2001) Berkeley problems in mathematics, 2nd edn. Springer, New York
Book MATH Google Scholar
Deutsch F (2001) Best approximation in inner product spaces. Springer, New York
Book MATH Google Scholar
Devijver PA (1985) Baum’s forward-backward algorithm revisited. Pattern Recogn Lett 3:369–373
Article MATH Google Scholar
Ding C, Li T, Jordan MI (2010) Convex and semi-nonnegative matrix factorizations. IEEE Trans Pattern Anal Mach Intell 32:45–55
Article Google Scholar
Dobson AJ (1990) An introduction to generalized linear models. Chapman & Hall, London
MATH Google Scholar
Donoho DL (2006) Compressed sensing. IEEE Trans Inform Theor 52:1289–1306
Article MathSciNet Google Scholar
Donoho D, Johnstone I (1994) Ideal spatial adaptation by wavelet shrinkage. Biometrika 81:425–455
Article MathSciNet MATH Google Scholar
Duan J-C, Simonato J-G (1993) Multiplicity of solutions in maximum likelihood factor analysis. J Stat Comput Simul 47:37–47
Article Google Scholar
Duchi J, Shalev-Shwartz S, Singer Y, Chandra T (2008) Efficient projections onto the l ₁-ball for learning in high dimensions. In: Proceedings of the 25th international conference on machine learning (ICML 2008). ACM, New York, pp 272–279
Google Scholar
Durbin R, Eddy S, Krogh A, Mitchison G (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge
Book MATH Google Scholar
Dykstra RL (1983) An algorithm for restricted least squares estimation. J Am Stat Assoc 78:837–842
Article MathSciNet MATH Google Scholar
Edgeworth FY (1887) On observations relating to several quantities. Hermathena 6:279–285
Google Scholar
Edgeworth FY (1888) On a new method of reducing observations relating to several quantities. Phil Mag 25:184–191
Article MATH Google Scholar
Edwards CH Jr (1973) Advanced calculus of several variables. Academic, New York
MATH Google Scholar
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32:407–499
Article MathSciNet MATH Google Scholar
Ekeland I (1974) On the variational principle. J Math Anal Appl 47:324–353
Article MathSciNet MATH Google Scholar
Elsner L, Koltracht L, Neumann M (1992) Convergence of sequential and asynchronous nonlinear paracontractions. Numer Math 62:305–319
Article MathSciNet MATH Google Scholar
Everitt BS (1977) The analysis of contingency tables. Chapman & Hall, London
Google Scholar
Fang S-C, Puthenpura S (1993) Linear optimization and extensions: theory and algorithms. Prentice-Hall, Englewood Cliffs
MATH Google Scholar
Fazel M, Hindi M, Boyd S (2003) Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices. Proc Am Contr Conf 3:2156–2162
Google Scholar
Feller W (1971) An introduction to probability theory and its applications, vol 2, 2nd edn. Wiley, Hoboken
Google Scholar
Fessler JA, Clinthorne NH, Rogers WL (1993) On complete-data spaces for PET reconstruction algorithms. IEEE Trans Nucl Sci 40:1055–1061
Article Google Scholar
Fiacco AV, McCormick GP (1968) Nonlinear programming: sequential unconstrained minimization techniques. Wiley, Hoboken
MATH Google Scholar
Fletcher R (2000) Practical methods of optimization, 2nd edn. Wiley, Hoboken
Google Scholar
Fletcher R, Powell MJD (1963) A rapidly convergent descent method for minimization. Comput J 6:163–168
Article MathSciNet MATH Google Scholar
Fletcher R, Reeves CM (1964) Function minimization by conjugate gradients. Comput J 7:149–154
Article MathSciNet MATH Google Scholar
Flury B, Zoppè A (2000) Exercises in EM. Am Stat 54:207–209
Google Scholar
Forsgren A, Gill PE, Wright MH (2002) Interior point methods for nonlinear optimization. SIAM Rev 44:523–597
Article MathSciNet Google Scholar
Franklin J (1983) Mathematical methods of economics. Am Math Mon 90:229–244
Article MathSciNet MATH Google Scholar
Friedman J, Hastie T, Tibshirani R (2007) Pathwise coordinate optimization. Ann Appl Stat 1:302–332
Article MathSciNet MATH Google Scholar
Friedman J, Hastie T, Tibshirani R (2009) Regularized paths for generalized linear models via coordinate descent. Technical Report, Department of Statistics, Stanford University
Google Scholar
Fu WJ (1998) Penalized regressions: the bridge versus the lasso. J Comput Graph Stat 7:397–416
Google Scholar
Gabriel KR, Zamir S (1979) Lower rank approximation of matrices by least squares with any choice of weights. Technometrics 21:489–498
Article MATH Google Scholar
Gelfand IM, Fomin SV (1963) Calculus of variations. Prentice-Hall, Englewood Cliffs
Google Scholar
Geman S, McClure D (1985) Bayesian image analysis: an application to single photon emission tomography. In: Proceedings of the statistical computing section. American Statistical Association, Washington, DC, pp 12–18
Google Scholar
Gifi A (1990) Nonlinear multivariate analysis. Wiley, Hoboken
MATH Google Scholar
Gill PE, Murray W, Wright MH (1991) Numerical linear algebra and optimization, vol 1. Addison-Wesley, Redwood City
MATH Google Scholar
Goldstein T, Osher S (2009) The split Bregman method for ℓ ₁-regularized problems. SIAM J Imag Sci 2:323–343
Article MathSciNet MATH Google Scholar
Golub GH, Van Loan CF (1996) Matrix computations, 3rd edn. Johns Hopkins University Press, Baltimore
MATH Google Scholar
Gordon RA (1998) The use of tagged partitions in elementary real analysis. Am Math Mon 105:107–117
Article MATH Google Scholar
Gould NIM (2008) How good are projection methods for convex feasibility problems? Comput Optim Appl 40:1–12
Article MathSciNet MATH Google Scholar
Green PJ (1984) Iteratively reweighted least squares for maximum likelihood estimation and some robust and resistant alternatives (with discussion). J Roy Stat Soc B 46:149–192
MATH Google Scholar
Green PJ (1990) Bayesian reconstruction for emission tomography data using a modified EM algorithm. IEEE Trans Med Imag 9:84–94
Article Google Scholar
Green PJ (1990) On use of the EM algorithm for penalized likelihood estimation. J Roy Stat Soc B 52:443–452
MATH Google Scholar
Grimmett GR, Stirzaker DR (1992) Probability and random processes, 2nd edn. Oxford University Press, Oxford
Google Scholar
Groenen PJF, Nalbantov G, Bioch JC (2007) Nonlinear support vector machines through iterative majorization and I-splines. In: Lenz HJ, Decker R (eds) Studies in classification, data analysis, and knowledge organization. Springer, Heidelberg, pp 149–161
Google Scholar
Guillemin V, Pollack A (1974) Differential topology. Prentice-Hall, Englewood Cliffs
MATH Google Scholar
Güler O (2010) Foundations of optimization. Springer, New York
Book MATH Google Scholar
Hämmerlin G, Hoffmann K-H (1991) Numerical mathematics. Springer, New York
Book Google Scholar
Hardy GH, Littlewood JE, Pólya G (1952) Inequalities, 2nd edn. Cambridge University Press, Cambridge
MATH Google Scholar
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York
Book MATH Google Scholar
He L, Marquina A, Osher S (2005) Blind deconvolution using TV regularization and Bregman iteration. Int J Imag Syst Technol 15, 74–83
Article Google Scholar
Heiser WJ (1987) Correspondence analysis with least absolute residuals. Comput Stat Data Anal 5:337–356
Article MATH Google Scholar
Heiser WJ (1995) Convergent computing by iterative majorization: theory and applications in multidimensional data analysis. In: Krzanowski WJ (ed) Recent advances in descriptive multivariate analysis. Clarendon, Oxford, pp 157–189
Google Scholar
Henrici P (1982) Essentials of numerical analysis with pocket calculator demonstrations. Wiley, Hoboken
MATH Google Scholar
Herman GT (1980) Image reconstruction from projections: the fundamentals of computerized tomography. Springer, New York
MATH Google Scholar
Hestenes MR (1981) Optimization theory: the finite dimensional case. Robert E Krieger Publishing, Huntington
Google Scholar
Hestenes MR, Karush WE (1951) A method of gradients for the calculation of the characteristic roots and vectors of a real symmetric matrix. J Res Natl Bur Stand 47:471–478
Article MathSciNet Google Scholar
Hestenes MR, Stiefel E (1952) Methods of conjugate gradients for solving linear systems. J Res Natl Bur Stand 29:409–439
Article MathSciNet Google Scholar
Higham NJ (2008) Functions of matrices: theory and computation. SIAM, Philadelphia
Book MATH Google Scholar
Hille E (1959) Analytic function theory, vol 1. Blaisdell, New York
MATH Google Scholar
Hiriart-Urruty J-B (1986) When is a point x satisfying ∇ f(x) = 0 a global minimum of f(x)? Am Math Mon 93:556–558
Article MathSciNet MATH Google Scholar
Hiriart-Urruty J-B, Claude Lemaréchal C (2001) Fundamentals of convex analysis. Springer, New York
Book MATH Google Scholar
Hochstadt H (1986) The functions of mathematical physics. Dover, New York
MATH Google Scholar
Hoel PG, Port SC, Stone CJ (1971) Introduction to probability theory. Houghton Mifflin, Boston
MATH Google Scholar
Hoffman K (1975) Analysis in Euclidean space. Prentice-Hall, Englewood Cliffs
MATH Google Scholar
Hoffman K, Kunze R (1971) Linear algebra, 2nd edn. Prentice-Hall, Englewood Cliffs
MATH Google Scholar
Horn RA, Johnson CR (1985) Matrix analysis. Cambridge University Press, Cambridge
Book MATH Google Scholar
Horn RA, Johnson CR (1991) Topics in matrix analysis. Cambridge University Press, Cambridge
Book MATH Google Scholar
Householder AS (1975) The theory of matrices in numerical analysis. Dover, New York
MATH Google Scholar
Hrusa W, Troutman JL (1981) Elementary characterization of classical minima. Am Math Mon 88:321–327
Article MathSciNet MATH Google Scholar
Hunter DR (2004) MM algorithms for generalized Bradley-Terry models. Ann Stat 32:386–408
Google Scholar
Hunter DR, Lange K (2000) Quantile regression via an MM algorithm. J Comput Graph Stat 9:60–77
MathSciNet Google Scholar
Hunter DR, Lange K (2004) A tutorial on MM algorithms. Am Stat 58:30–37
Article MathSciNet Google Scholar
Hunter DR, Li R (2005) Variable selection using MM algorithms. Ann Stat 33:1617–1642
Article MathSciNet MATH Google Scholar
Jamshidian M, Jennrich RI (1995) Acceleration of the EM algorithm by using quasi-Newton methods. J Roy Stat Soc B 59:569–587
Article MathSciNet Google Scholar
Jamshidian M, Jennrich RI (1997) Quasi-Newton acceleration of the EM algorithm. J Roy Stat Soc B 59:569–587
Article MathSciNet MATH Google Scholar
Jennrich RI, Moore RH (1975) Maximum likelihood estimation by means of nonlinear least squares. In: Proceedings of the statistical computing section. American Statistical Association, Washington, DC, pp 57–65
Google Scholar
Jia R-Q, Zhao H, Zhao W (2009) Convergence analysis of the Bregman method for the variational model of image denoising. Appl Comput Harmon Anal 27:367–379
Article MathSciNet MATH Google Scholar
Karlin S, Taylor HM (1975) A first course in stochastic processes, 2nd edn. Academic, New York
MATH Google Scholar
Karush W (1939) Minima of functions of several variables with inequalities as side conditions. Master’s Thesis, Department of Mathematics, University of Chicago, Chicago
Google Scholar
Keener JP (1993) The Perron-Frobenius theorem and the ranking of football teams. SIAM Rev 35:80–93
Article MathSciNet MATH Google Scholar
Kelley CT (1999) Iterative methods for optimization. SIAM, Philadelphia
Book MATH Google Scholar
Khalfan HF, Byrd RH, Schnabel RB (1993) A theoretical and experimental study of the symmetric rank-one update. SIAM J Optim 3:1–24
Article MathSciNet MATH Google Scholar
Kiers HAL (1997) Weighted least squares fitting using ordinary least squares algorithms. Psychometrika 62:251–266
Article MathSciNet MATH Google Scholar
Kingman JFC (1993) Poisson processes. Oxford University Press, Oxford
MATH Google Scholar
Komiya H (1988) Elementary proof for Sion’s minimax theorem. Kodai Math J 11:5–7
Article MathSciNet MATH Google Scholar
Kosowsky JJ, Yuille AL (1994) The invisible hand algorithm: solving the assignment problem with statistical physics. Neural Network 7:477–490
Article MATH Google Scholar
Kruskal JB (1964) Nonmetric multidimensional scaling: a numerical method. Psychometrika 29:115–129
Article MathSciNet MATH Google Scholar
Kruskal JB (1965) Analysis of factorial experiments by estimating monotone transformations of the data. J Roy Stat Soc B 27:251–263
MathSciNet Google Scholar
Ku HH, Kullback S (1974) Log-linear models in contingency table analysis. Biometrics 10:452–458
Google Scholar
Kuhn S (1991) The derivative á la Carathéodory. Am Math Mon 98:40–44
Article MATH Google Scholar
Kuhn HW, Tucker AW (1951) Nonlinear programming. In: Proceedings of the second Berkeley symposium on mathematical statistics and probability. University of California Press, Berkeley
Google Scholar
Lange K (1994) An adaptive barrier method for convex programming. Meth Appl Anal 1:392–402
MATH Google Scholar
Lange K (1995) A gradient algorithm locally equivalent to the EM algorithm. J Roy Stat Soc B 57:425–437
MATH Google Scholar
Lange K (1995) A quasi-Newton acceleration of the EM algorithm. Stat Sin 5:1–18
MATH Google Scholar
Lange K (2002) Mathematical and statistical methods for genetic analysis, 2nd edn. Springer, New York
Book MATH Google Scholar
Lange K (2010) Numerical analysis for statisticians, 2nd edn. Springer, New York
Book MATH Google Scholar
Lange K, Carson R (1984) EM reconstruction algorithms for emission and transmission tomography. J Comput Assist Tomogr 8:306–316
Google Scholar
Lange K, Fessler JA (1995) Globally convergent algorithms for maximum a posteriori transmission tomography. IEEE Trans Image Process 4:1430–1438
Article Google Scholar
Lange K, Wu T (2008) An MM algorithm for multicategory vertex discriminant analysis. J Comput Graph Stat 17:527–544
Article MathSciNet Google Scholar
Lange K, Zhou H (2012) MM algorithms for geometric and signomial programming. Math Program, Series A, DOI 10.1007/s10107-012-0612-1
Google Scholar
Lange K, Hunter D, Yang I (2000) Optimization transfer using surrogate objective functions (with discussion). J Comput Graph Stat 9:1–59
MathSciNet Google Scholar
Lax PD (2007) Linear algebra and its applications, 2nd edn. Wiley, Hoboken
MATH Google Scholar
Ledoita O, Wolf M (2004) A well-conditioned estimator for large-dimensional covariance matrices. J Multivar Anal 88:365–411
Article Google Scholar
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791
Article Google Scholar
Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. Adv Neural Inf Process Syst 13:556–562
Google Scholar
Lehmann EL (1986) Testing statistical hypotheses, 2nd edn. Wiley, Hoboken
Book MATH Google Scholar
Levina E, Rothman A, Zhu J (2008) Sparse estimation of large covariance matrices via a nested lasso penalty. Ann Appl Stat 2:245–263
Article MathSciNet MATH Google Scholar
Li Y, Arce GR (2004) A maximum likelihood approach to least absolute deviation regression. EURASIP J Appl Signal Process 2004:1762–1769
Article MathSciNet Google Scholar
Little RJA, Rubin DB (1987) Statistical analysis with missing data. Wiley, Hoboken
MATH Google Scholar
Louis TA (1982) Finding the observed information matrix when using the EM algorithm. J Roy Stat Soc B 44:226–233
MathSciNet MATH Google Scholar
Luce RD (1959) Individual choice behavior: a theoretical analysis. Wiley, Hoboken
MATH Google Scholar
Luce RD (1977) The choice axiom after twenty years. J Math Psychol 15:215–233
Article MathSciNet MATH Google Scholar
Luenberger DG (1984) Linear and nonlinear programming, 2nd edn. Addison-Wesley, Reading
MATH Google Scholar
Magnus JR, Neudecker H (1988) Matrix differential calculus with applications in statistics and econometrics. Wiley, Hoboken
MATH Google Scholar
Maher MJ (1982) Modelling association football scores. Stat Neerl 36:109–118
Article Google Scholar
Mangasarian OL, Fromovitz S (1967) The Fritz John necessary optimality conditions in the presence of equality and inequality constraints. J Math Anal Appl 17:37–47
Article MathSciNet MATH Google Scholar
Mardia KV, Kent JT, Bibby JM (1979) Multivariate analysis. Academic, New York
MATH Google Scholar
Marsden JE, Hoffman MJ (1993) Elementary classical analysis, 2nd edn. W H Freeman & Co, New York
MATH Google Scholar
Mazumder R, Hastie T, Tibshirani R (2010) Spectral regularization algorithms for learning large incomplete matrices. J Mach Learn Res 11:2287–2322
MathSciNet MATH Google Scholar
McLachlan GJ, Do K-A, Ambroise C (2004) Analyzing microarray gene expression data. Wiley, Hoboken
Book MATH Google Scholar
McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions, 2nd edn. Wiley, Hoboken
Book MATH Google Scholar
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, Hoboken
Book MATH Google Scholar
McLeod RM (1980) The generalized Riemann integral. Mathematical Association of America, Washington, DC
MATH Google Scholar
McShane EJ (1973) The Lagrange multiplier rule. Am Math Mon 80:922–925
Article MathSciNet MATH Google Scholar
Meyer RR (1976) Sufficient conditions for the convergence of monotonic mathematical programming algorithms. J Comput Syst Sci 12:108–121
Article MATH Google Scholar
Michelot C (1986) A finite algorithm for finding the projection of a point onto the canonical simplex in R ⁿ. J Optim Theor Appl 50:195–200
Article MathSciNet MATH Google Scholar
Miller KS (1987) Some eclectic matrix theory. Robert E Krieger Publishing, Malabar
MATH Google Scholar
Moré JJ, Sorensen DC (1983) Computing a trust region step. SIAM J Sci Stat Comput 4:553–572
Article MATH Google Scholar
Narayanan A (1991) Algorithm AS 266: maximum likelihood estimation of the parameters of the Dirichlet distribution. Appl Stat 40:365–374
Article Google Scholar
Nazareth L (1979) A relationship between the BFGS and conjugate gradient algorithms and its implications for new algorithms. SIAM J Numer Anal 16:794–800
Article MathSciNet MATH Google Scholar
Nedelman J, Wallenuis T (1986) Bernoulli trials, Poisson trials, surprising variances, and Jensen’s inequality. Am Stat 40:286–289
Google Scholar
Nelder JA, Wedderburn RWM (1972) Generalized linear models. J Roy Stat Soc A 135:370–384
Article Google Scholar
Nemirovski AS, Todd MJ (2008) Interior-point methods for optimization. Acta Numerica 17:191–234
Article MathSciNet MATH Google Scholar
Nocedal J (1991) Theory of algorithms for unconstrained optimization. Acta Numerica 1991:199–242
MathSciNet Google Scholar
Nocedal J, Wright S (2006) Numerical optimization, 2nd edn. Springer, New York
MATH Google Scholar
Orchard T, Woodbury MA (1972) A missing information principle: theory and applications. In: Proceedings of the 6th Berkeley symposium on mathematical statistics and probability. University of California Press, Berkeley, pp 697–715
Google Scholar
Ortega JM (1990) Numerical analysis: a second course. Society for Industrial and Applied Mathematics, Philadelphia
Book Google Scholar
Osher S, Burger M, Goldfarb D, Xu J, Yin W (2005) An iterative regularization method for total variation based image restoration. Multiscale Model Simul 4:460–489
Article MathSciNet MATH Google Scholar
Osher S, Mao T, Dong B, Yin W (2011) Fast linearized Bregman iteration for compressive sensing and sparse denoising. Comm Math Sci 8:93–111
MathSciNet Google Scholar
Park MY, Hastie T (2008) Penalized logistic regression for detecting gene interactions. Biostatistics 9:30–50
Article MATH Google Scholar
Pauca VP, Piper J, Plemmons RJ (2006) Nonnegative matrix factorization for spectral data analysis. Linear Algebra Appl 416:29–47
Article MathSciNet MATH Google Scholar
Peressini AL, Sullivan FE, Uhl JJ Jr (1988) The mathematics of nonlinear programming. Springer, New York
Book MATH Google Scholar
Polya G (1954) Induction and analogy in mathematics. Volume I of mathematics and plausible reasoning. Princeton University Press, Princeton
Google Scholar
Portnoy S, Koenker R (1997) The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute-error estimators. Stat Sci 12:279–300
Article MathSciNet MATH Google Scholar
Press WH, Teukolsky SA, Vetterling WT, Flannery BP (1992) Numerical recipes in Fortran: the art of scientific computing, 2nd edn. Cambridge University Press, Cambridge
Google Scholar
Rabiner L (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77:257–285
Article Google Scholar
Ranola JM, Ahn S, Sehl ME, Smith DJ, Lange K (2010) A Poisson model for random multigraphs. Bioinformatics 26:2004–2011
Article Google Scholar
Rao CR (1973) Linear statistical inference and its applications, 2nd edn. Wiley, Hoboken
Book MATH Google Scholar
Robertson T, Wright FT, Dykstra RL (1988) Order restricted statistical inference. Wiley, Hoboken
MATH Google Scholar
Romano G (1995) New results in subdifferential calculus with applications to convex analysis. Appl Math Optim 32:213–234
Article MathSciNet MATH Google Scholar
Rockafellar RT (1996) Convex analysis. Princeton University Press, Princeton
Google Scholar
Royden HL (1988) Real analysis, 3rd edn. Macmillan, London
MATH Google Scholar
Rudin W (1979) Principles of mathematical analysis, 3rd edn. McGraw-Hill, New York
Google Scholar
Rudin LI, Osher S, Fatemi E (1992) Nonlinear total variation based noise removal algorithms. Physica D 60:259–268
Article MATH Google Scholar
Rustagi JS (1976) Variational methods in statistics. Academic, New York
MATH Google Scholar
Ruszczyński A (2006) Nonlinear optimization. Princeton University Press, Princeton
MATH Google Scholar
Sabatti C, Lange K (2002) Genomewide motif identification using a dictionary model. Proc IEEE 90:1803–1810
Article Google Scholar
Sagan H (1969) Introduction to the calculus of variations. McGraw-Hill, New York
Google Scholar
Santosa F, Symes WW (1986) Linear inversion of band-limited reflection seimograms. SIAM J Sci Stat Comput 7:1307–1330
Article MathSciNet MATH Google Scholar
Schmidt M, van den Berg E, Friedlander MP, Murphy K (2009) Optimizing costly functions with simple constraints: a limited-memory projected quasi-Newton algorithm. In: van Dyk D, Welling M (eds) Proceedings of The twelfth international conference on artificial intelligence and statistics (AISTATS), vol 5, pp 456–463
Google Scholar
Schölkopf B, Smola AJ (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT, Cambridge
Google Scholar
Seber GAF, Lee AJ (2003) Linear regression analysis, 2nd edn. Wiley, Hoboken
Book MATH Google Scholar
Segel LA (1977) Mathematics applied to continuum mechanics. Macmillan, New York
MATH Google Scholar
Seneta E (1973) Non-negative matrices: an introduction to theory and applications. Wiley, Hoboken
MATH Google Scholar
Sha F, Saul LK, Lee DD (2003) Multiplicative updates for nonnegative quadratic programming in support vector machines. In: Becker S, Thrun S, Obermayer K (eds) Advances in neural information processing systems 15. MIT, Cambridge, pp 1065–1073
Google Scholar
Silvapulle MJ, Sen PK (2005) Constrained statistical inference. Wiley, Hoboken
MATH Google Scholar
Sinkhorn R (1967) Diagonal equivalence to matrices with prescribed row and column sums. Am Math Mon 74:402–405
Article MathSciNet MATH Google Scholar
Sion M (1958) On general minimax theorems. Pac J Math 8:171–176
Article MathSciNet MATH Google Scholar
Smith CAB (1957) Counting methods in genetical statistics. Ann Hum Genet 21:254–276
Article Google Scholar
Smith DR (1974) Variational methods in optimization. Dover, Mineola
MATH Google Scholar
Sorensen DC (1997) Minimization of a large-scale quadratic function subject to spherical constraints. SIAM J Optim 7:141–161
Article MathSciNet MATH Google Scholar
Srebro N, Jaakkola T (2003) Weighted low-rank approximations. In: Machine learning international workshop conference 2003. AAAI Press, 20:720–727
Google Scholar
Steele JM (2004) The Cauchy-Schwarz master class: an introduction to the art of inequalities. Cambridge University Press and the Mathematical Association of America, Cambridge
Book Google Scholar
Stein EM, Shakarchi R (2003) Complex analysis. Princeton University Press, Princeton
MATH Google Scholar
Stern RJ, Wolkowicz H (1995) Indefinite trust region subproblems and nonsymmetric eigenvalue perturbations. SIAM J Optim 5:286–313
Article MathSciNet MATH Google Scholar
Stoer J, Bulirsch R (2002) Introduction to numerical analysis, 3rd edn. Springer, New York
MATH Google Scholar
Strang G (1986) Introduction to applied mathematics. Wellesley-Cambridge, Wellesley
MATH Google Scholar
Strang G (1986) The fundamental theorem of linear algebra. Am Math Mon 100:848–855
Article MathSciNet Google Scholar
Strang G (2003) Introduction to linear algebra, 3rd edn. Wellesley-Cambridge, Wellesley
Google Scholar
Swartz C, Thomson BS (1988) More on the fundamental theorem of calculus. Am Math Mon 95:644–648
Article MathSciNet MATH Google Scholar
Tanner MA (1993) Tools for statistical inference: methods for the exploration of posterior distributions and likelihood functions, 2nd edn. Springer, New York
MATH Google Scholar
Taylor H, Banks SC, McCoy JF (1979) Deconvolution with the ℓ ₁ norm. Geophysics 44:39–52
Article Google Scholar
Teboulle M (1992) Entropic proximal mappings with applications to nonlinear programming. Math Oper Res 17:670–690
Article MathSciNet MATH Google Scholar
Theobald CM (1975) An inequality for the trace of the product of two symmetric matrices. Math Proc Camb Phil Soc 77:265–267
Article MathSciNet MATH Google Scholar
Thompson HB (1989) Taylor’s theorem using the generalized Riemann integral. Am Math Mon 96:346–350
Article MATH Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc B 58:267–288
MathSciNet MATH Google Scholar
Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K (2005) Sparsity and smoothness via the fused lasso. J Roy Stat Soc B 67:91–108
Article MathSciNet MATH Google Scholar
Tikhomirov VM (1990) Stories about maxima and minima. American Mathematical Society, Providence
MATH Google Scholar
Titterington DM, Smith AFM, Makov UE (1985) Statistical analysis of finite mixture distributions. Wiley, Hoboken
MATH Google Scholar
Trefethen LN, Bau D (1997) Numerical linear algebra. SIAM, Philadelphia
Book MATH Google Scholar
Uherka DJ, Sergott AM (1977) On the continuous dependence of the roots of a polynomial on its coefficients. Am Math Mon 84:368–370
Article MathSciNet MATH Google Scholar
Vandenberghe L, Boyd S, Wu S (1998) Determinant maximization with linear matrix inequality constraints. SIAM J Matrix Anal Appl 19:499–533
Article MathSciNet MATH Google Scholar
Van Ruitenburg J (2005) Algorithms for parameter estimation in the Rasch model. Measurement and Research Department Reports 2005–4. CITO, Arnhem
Google Scholar
Vapnik V (1995) The nature of statistical learning theory. Springer, New York
MATH Google Scholar
Vardi Y, Shepp LA, Kaufman L (1985) A statistical model for positron emission tomography. J Am Stat Assoc 80:8–37
Article MathSciNet MATH Google Scholar
Von Neumann J (1928) Zur theorie der gesellschaftsspiele. Math Ann 100:295–320
Article MathSciNet MATH Google Scholar
Wang L, Gordon MD, Zhu J (2006) Regularized least absolute deviations regression and an efficient algorithm for parameter tuning. In: Proceedings of the sixth international conference on data mining (ICDM’06). IEEE Computer Society, Washington, DC, pp 690–700
Google Scholar
Wang S, Yehya N, Schadt EE, Wang H, Drake TA, Lusis AJ (2006) Genetic and genomic analysis of a fat mass trait with complex inheritance reveals marked sex specificity. PLoS Genet 2:148–159
Article Google Scholar
Watson GA (1992) Characterization of the subdifferential of some matrix norms. Linear Algebra Appl 170:1039–1053
Article Google Scholar
Weeks DE, Lange K (1989) Trials, tribulations, and triumphs of the EM algorithm in pedigree analysis. IMA J Math Appl Med Biol 6:209–232
Article MathSciNet MATH Google Scholar
Weiszfeld E (1937) On the point for which the sum of the distances to n given points is minimum. Ann Oper Res 167:741 (Translated from the French original [Tohoku Math J 43:335–386 (1937)] and annotated by Frank Plastria)
Google Scholar
Weston J, Elisseeff A, Schölkopf B, Tipping M (2003) Use of the zero-norm with linear models and kernel methods. J Mach Learn Res 3:1439–1461
MATH Google Scholar
Whyte BM, Gold J, Dobson AJ, Cooper DA (1987) Epidemiology of acquired immunodeficiency syndrome in Australia. Med J Aust 147:65–69
Google Scholar
Wright MH (2005) The interior-point revolution in optimization: history, recent developments, and lasting consequences. Bull Am Math Soc 42:39–56
Article MATH Google Scholar
Wu CF (1983) On the convergence properties of the EM algorithm. Ann Stat 11:95–103
Article MATH Google Scholar
Wu TT, Lange K (2008) Coordinate descent algorithms for lasso penalized regression. Ann Appl Stat 2:224–244
Article MathSciNet MATH Google Scholar
Wu TT, Lange K (2010) Multicategory vertex discriminant analysis for high-dimensional data. Ann Appl Stat 4:1698–1721
Article MathSciNet MATH Google Scholar
Yee PL, Vyb́orný R (2000) The integral: an easy approach after Kurzweil and Henstock. Cambridge University Press, Cambridge
Google Scholar
Yin W, Osher S, Goldfarb D, Darbon J (2008) Bregman iterative algorithms for ℓ ₁-minimization with applications to compressed sensing. SIAM J Imag Sci 1:143–168
Article MathSciNet MATH Google Scholar
Zhang Z, Lange K, Ophoff R, Sabatti C (2010) Reconstructing DNA copy number by penalized estimation and imputation. Ann Appl Stat 4:1749–1773
Article MathSciNet MATH Google Scholar
Zhou H, Lange K (2009) On the bumpy road to the dominant mode. Scand J Stat 37:612–631
Article MathSciNet Google Scholar
Zhou H, Lange K (2010) MM algorithms for some discrete multivariate distributions. J Comput Graph Stat 19:645–665
Article MathSciNet Google Scholar
Zhou H, Lange K (2012) A path algorithm for constrained estimation. J Comput Graph Stat DOI 10.1080/10618600.2012.681248
Google Scholar
Zhou H, Lange K (2012) Path following in the exact penalty method of convex programming (submitted)
Google Scholar

Download references

Author information

Authors and Affiliations

Biomathematics, Human Genetics, Statistics, University of California, Los Angeles, CA, USA
Kenneth Lange

Authors

Kenneth Lange
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lange, K. (2013). Feasibility and Duality. In: Optimization. Springer Texts in Statistics, vol 95. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-5838-8_15

Download citation

DOI: https://doi.org/10.1007/978-1-4614-5838-8_15
Published: 21 October 2012
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-5837-1
Online ISBN: 978-1-4614-5838-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics