Jump to content

Frank–Wolfe algorithm

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Mathmon (talk | contribs) at 22:21, 23 October 2013 (Added a scetion about lower bounds). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

The Frank–Wolfe algorithm is a simple iterative first-order optimization algorithm for constrained convex optimization. Also known as the conditional gradient method,[1] reduced gradient algorithm and the convex combination algorithm, the method was originally proposed by Marguerite Frank and Philip Wolfe in 1956.[2] In each iteration, the Frank–Wolfe algorithm considers a linear approximation of the objective function, and moves slightly towards a minimizer of this linear function (taken over the same domain).

Problem statement

Minimize
subject to .

Where the function f is convex and differentiable, and the domain / feasible set is convex and bounded.

Algorithm

A step of the Frank-Wolfe algorithm
Initialization. Let and let be any point in .
Step 1. Direction-finding subproblem. Find solving
Minimize
Subject to
Interpretation: Minimize the linear approximation of the problem given by the first-order Taylor approximation of f around .
Step 2. Step size determination. Set , or alternatively find that minimizes subject to .
Step 3. Update. Let , let and go back to Step 1.

Properties

While competing methods such as gradient descent for constrained optimization require a projection step back to the feasible set in each iteration, the Frank–Wolfe algorithm only needs the solution of a linear problem over the same set in each iteration, and automatically stays in the feasible set.

The convergence of the Frank–Wolfe algorithm is sublinear in general: the error to the optimum is after k iterations. The same convergence rate can also be shown if the sub-problems are only solved approximately.[3]

The iterates of the algorithm can always be represented as a sparse convex combination of the extreme points of the feasible set, which has helped to the popularity of the algorithm for sparse greedy optimization in machine learning and signal processing problems,[4] as well as for example the optimization of minimum–cost flows in transportation networks.[5]

If the feasible set is given by a set of linear constraints, then the subproblem to be solved in each iteration becomes a linear program.

While the worst-case convergence rate with can not be improved in general, faster convergence can be obtained for special problem classes, such as some strongly convex problems.[6]

Lower bounds on the solution value

Since is convex, is always above the tangent plane of at any point :

This holds in particular for the optimal solution . The best lower bound with respect to a given point is given by

The latter optimization problem is solved in every iteration of the Frank-Wolfe algorithm, therefore the solution of the direction-finding subproblem of the -th iteration can be used to determine increasing lower bounds during each iteration by setting and

Notes

  1. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1016/0041-5553(66)90114-5, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1016/0041-5553(66)90114-5 instead.
  2. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1002/nav.3800030109, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1002/nav.3800030109 instead.
  3. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1016/0022-247X(78)90137-3, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1016/0022-247X(78)90137-3 instead.
  4. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1145/1824777.1824783, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1145/1824777.1824783 instead.
  5. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1016/0191-2615(84)90029-8, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1016/0191-2615(84)90029-8 instead.
  6. ^ Bertsekas, Dimitri (2003). Nonlinear Programming. Athena Scientific. p. 222. ISBN 1-886529-00-0.

Bibliography

See also