MM algorithm

The MM algorithm is an iterative optimization method which exploits the convexity of a function in order to find their maximums or minimums. The MM stands for “Majorize-Minimization” or “Minorize-Maximization”, depending on whether you’re doing maximization or minimization.MM itself is not an algorithm, but description of a way of how to construct an optimization algorithm.

The EM algorithm can be treated as a special case of the MM algorithm. However, in the EM algorithm complex conditional expectation and extensive analytical skills are usually involved, while in the MM algorithm convexity and inequalities are our major fouc, and it is relatively easier to understand and apply in most of the cases.

History

The original idea of the MM algorithm can be dated back at least to 1970 when Ortega and Rheinboldt were doing their studies related to line search methods.^[1] The same idea kept reappearing under different guise in different areas since, until 2000 Hunter and Lange put all in to a general frame works and named MM for the first time.^[2] Recently studies have shown that it can be used in a wide range of context, like mathematics, statistics, machine learning, engineering, etc.

How it works

MM algorithm works by finding a surrogate function that minorizes or majorizes the objective function. Optimizing the surrogate functions will drive the objective function upward or downward until a local optimum is reached.

Take the minorize-maximazation version for example.

Let $f(\theta )$ be the objective convex function we want to maximize. At the $m$ step of the algorithm, $m=0,1...$ , the constructed function $g(\theta |\theta _{m})$ will be called the minorized version of the objective function (the surrogate function) at $\theta _{m}$ if

    $g(\theta |\theta _{m})$  ≤  $f(\theta )$   for all  $\theta$  
    $g(\theta _{m}|\theta _{m})=f(\theta _{m})$

Then we maximize $g(\theta |\theta _{m})$ instead of $f(\theta )$ , and let

    $\theta _{m+1}=\max _{\theta }g(\theta |\theta _{m})$

The above iterative method will guarantee that $f(\theta _{m})$ will converge to a local optimum or a saddle point as $m$ goes to infinity, because by the construction we have

   $f(\theta _{m+1})$  ≥  $g(\theta _{m+1}|\theta _{m})$  ≥  $g(\theta _{m}|\theta _{m})=f(\theta _{m})$

The marching of $\theta _{m}$ and the surrogate functions relative to the objective function is shown on the Figure

Ways to construct surrogate functions

Basically, we can use any inequalities to construct the desired majorized/minorized version of the objective function, but there are several typical choices

References

^ Ortega, J.M.; Rheinboldt, W.C. (1970). "Iterative Solutions of Nonlinear Equations in Several Variables". New York: Academic: 253–255.
^ Hunter, D.R.; Lange, K. (1970). "Quantile Regression via anMMAlgorithm". Journal of Computational and Graphical Statistics. 9: 60–77. {{cite journal}}: line feed character in |journal= at position 27 (help)

[1] Ortega, J.M.; Rheinboldt, W.C. (1970). "Iterative Solutions of Nonlinear Equations in Several Variables". New York: Academic: 253–255.

[2] Hunter, D.R.; Lange, K. (1970). "Quantile Regression via anMMAlgorithm". Journal of Computational and Graphical Statistics. 9: 60–77. {{cite journal}}: line feed character in |journal= at position 27 (help)

[1]

[2]