# The MM Algorithm

Chapter
Part of the Statistics and Computing book series (SCO)

## Abstract

Most practical optimization problems defy exact solution. In the current chapter we discuss an optimization method that relies heavily on convexity arguments and is particularly useful in high-dimensional problems such as image reconstruction [27]. This iterative method is called the MM algorithm. One of the virtues of the MM acronym is that it does double duty. In minimization problems, the first M stands for majorize and the second M for minimize. In maximization problems, the first M stands for minorize and the second M for maximize. When it is successful, the MM algorithm substitutes a simple optimization problem for a difficult optimization problem. Simplicity can be attained by (a) avoiding large matrix inversions, (b) linearizing an optimization problem, (c) separating the variables of an optimization problem, (d) dealing with equality and inequality constraints gracefully, and (e) turning a nondifferentiable problem into a smooth problem. In simplifying the original problem, we pay the price of iteration or iteration with a slower rate of convergence.

## Keywords

Projection Line Surrogate Function Random Graph Model Cyclic Coordinate Descent Transmission Tomography
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

## References

1. 1.
Becker MP, Yang I, Lange K (1997) EM algorithms without missing data. Stat Methods Med Res 6:37-53Google Scholar
2. 2.
Böhning D, Lindsay BG (1988) Monotonicity of quadratic approximation algorithms. Ann Instit Stat Math 40:641-663
3. 3.
Bradley RA, Terry ME (1952), Rank analysis of incomplete block designs. Biometrika, 39:324-345
4. 4.
De Leeuw J (1994) Block relaxation algorithms in statistics. in Information Systems and Data Analysis, Bock HH, Lenski W, Richter MM, Springer, New York, pp 308-325Google Scholar
5. 5.
De Leeuw J (2006) Some majorization techniques. Preprint series, UCLA Department of Statistics.Google Scholar
6. 6.
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J Roy Stat Soc B 39:1-38
7. 7.
Dempster AP, Laird NM, Rubin DB (1980) Iteratively reweighted least squares for linear regression when the errors are normal/independent distributed. in Multivariate Analysis V, Krishnaiah PR, editor, North Holland, Amsterdam, pp 35-57Google Scholar
8. 8.
De Pierro AR (1993) On the relation between the ISRA and EM algorithm for positron emission tomography. IEEE Trans Med Imaging 12:328-333
9. 9.
Geman S, McClure D (1985) Bayesian image analysis: An application to single photon emission tomography. Proc Stat Comput Sec, Amer Stat Assoc, Washington, DC, pp 12-18Google Scholar
10. 10.
Green P (1990) Bayesian reconstruction for emission tomography data using a modified EM algorithm. IEEE Trans Med Imaging 9:84-94
11. 11.
Grimmett GR, Stirzaker DR (1992) Probability and Random Processes, 2nd ed. Oxford University Press, OxfordGoogle Scholar
12. 12.
Heiser WJ (1995) Convergent computing by iterative majorization: theory and applications in multidimensional data analysis. in Recent Advances in Descriptive Multivariate Analysis, Krzanowski WJ, Clarendon Press, Oxford pp 157-189Google Scholar
13. 13.
Herman GT (1980) Image Reconstruction from Projections: The Fundamentals of Computerized Tomography. Springer, New York
14. 14.
Hoel PG, Port SC, Stone CJ (1971) Introduction to Probability Theory. Houghton Mifflin, Boston
15. 15.
Huber PJ (1981) Robust Statistics, Wiley, New York
16. 16.
Hunter DR (2004) MM algorithms for generalized Bradley-Terry models. Annals Stat 32:386-408
17. 17.
Hunter DR, Lange K (2004) A tutorial on MM algorithms. Amer Statistician 58:30-37
18. 18.
Karlin S, Taylor HM (1975) A First Course in Stochastic Processes, 2nd ed. Academic Press, New York
19. 19.
Keener JP (1993), The Perron-Frobenius theorem and the ranking of football teams. SIAM Review, 35:80-93
20. 20.
Kent JT, Tyler DE, Vardi Y (1994) A curious likelihood identity for the multivariate t-distribution. Comm Stat Simulation 23:441-453
21. 21.
Kingman JFC (1993) Poisson Processes. Oxford University Press, Oxford
22. 22.
Lange K (1995) A gradient algorithm locally equivalent to the EM algorithm. J Roy Stat Soc B 57:425-437
23. 23.
Lange K (2002) Mathematical and Statistical Methods for Genetic Analysis, 2nd ed. Springer, New York
24. 24.
Lange K (2004) Optimization. Springer, New York
25. 25.
Lange K, Carson R (1984) EM reconstruction algorithms for emission and transmission tomography. J Computer Assist Tomography 8:306-316Google Scholar
26. 26.
Lange K, Fessler JA (1995) Globally convergent algorithms for maximum a posteriori transmission tomography. IEEE Trans Image Processing 4:1430-1438
27. 27.
Lange K, Hunter D, Yang I (2000) Optimization transfer using surrogate objective functions (with discussion). J Computational Graphical Stat 9:1-59
28. 28.
Lange K, Little RJA, Taylor JMG (1989) Robust statistical modeling using the t distribution. J Amer Stat Assoc 84:881-896
29. 29.
Lange K, Sinsheimer JS (1993) Normal/independent distributions and their applications in robust regression. J Comp Graph Stat 2:175-198
30. 30.
Luce RD (1959) Individual Choice Behavior: A Theoretical Analysis. Wiley, New York
31. 31.
Luce RD (1977) The choice axiom after twenty years. J Math Psychology 15:215-233
32. 32.
McLachlan GJ, Krishnan T (2008) The EM Algorithm and Extensions, 2nd ed. Wiley, New York
33. 33.
Merle G, Spath H (1974) Computational experiences with discrete Lp approximation. Computing 12:315-321
34. 34.
Rao CR (1973) Linear Statistical Inference and its Applications, 2nd ed. Wiley, New York
35. 35.
Sabatti C, Lange K (2002) Genomewide motif identification using a dictionary model. Proceedings IEEE 90:1803-1810
36. 36.
Schlossmacher EJ (1973) An iterative technique for absolute deviations curve fitting. J Amer Stat Assoc 68:857-859
37. 37.
Sha F, Saul LK, Lee DD (2003) Multiplicative updates for nonnegative quadratic programming in support vector machines. In Advances in Neural Information Processing Systems 15, Becker S, Thrun S, Obermayer K, editors, MIT Press, Cambridge, MA, pp 1065-1073Google Scholar
38. 38.
Steele JM (2004) The Cauchy-Schwarz Master Class: An Introduction to the Art of Inequalities. Cambridge University Press and the Mathematical Association of America, Cambridge
39. 39.
van Ruitenburg J (2005) Algorithms for parameter estimation in the Rasch model. Measurement and Research Department Reports 2005-4, CITO, Arnhem, NetherlandsGoogle Scholar
40. 40.
Wu TT, Lange K (2009) The MM alternative to EM. Stat Sci (in press)Google Scholar