Multiplicative weight update method
Multiplicative weight update method is a meta-algorithm. It is an algorithmic technique which “maintains a distribution on a certain set of interest, and updates it iteratively by multiplying the probability mass of elements by suitably chosen factors based on feedback obtained by running another algorithm on the distribution” [1]. It was discovered repeatedly in very diverse fields such as machine learning (AdaBoost, Winnow, Hedge), optimization (solving LPs), theoretical computer science (devising fast algorithm for LPs and SDPs), and game theory.
Name
“Multiplicative weights” implies the iterative rule used in algorithms derived from the Multiplicative Weight Update Method [2]. It is given with different names in the different fields where it was discovered or rediscovered.
Schneier, Bruce. "Questions & Answers about Yarrow". Schneier on Security. Retrieved 2016-02-15. The fortuneteller would divide a set of 50 stalks into piles, then repeatedly use modulo arithmetic to generate two random bits.
</ref>
that have a non-uniform distribution.
Background
An algorithm named “Fictitious Play” similar to multiplicative weight update method was proposed in game theory in the early 1950’s. Grigoriadis and Khachiyan[4] applied a randomized variant of “Fictitious Play” to solve two-player zero-sum games efficiently using the multiplicative weights algorithm. That is, player allocates higher weight to the actions that had a better outcome and choose his strategy relying on these weights. In machine learning, Littlestone applied the earliest form of the multiplicative weights update rule in his famous Winnow Algorithm, which is similar to Minsky and Papert’s earlier perceptron learning algorithm.
Motivation
A binary decision needs to be made based on n experts’ opinions to attain an associated payoff. In the first round, all experts’ opinions have the same weight. The decision maker will make the first decision based on the majority of the experts' prediction. Then, in each successive round, the decision maker will repeatedly update the weight of each expert's opinion depending on the correctness of his prior predictions.
Algorithm Analysis
Weighted Majority Algorithm [2][5]
Given the setting in prediction from expert advice. Denote the experts as . Initialize the set of experts who have not made a mistake to . Initialization: Fix an η≤1/2. For each expert, associate the weight Failed to parse (syntax error): {\displaystyle {w_i}^1≔1} . For Failed to parse (syntax error): {\displaystyle t = 1, 2,…,T} 1. Make the prediction that is the weighted majority of the experts’ predictions based on the weights . That is, making a binary choice depending on which prediction has a higher total weight of experts advising it (breaking ties arbitrarily). 2. For every expert i who predicts wrongly, decrease his weight for the next round by multiplying it by a factor of (1-η):
- Failed to parse (syntax error): {\displaystyle w_{i}^{t+1}=(1-η) w_{i}^{t}} (update rule)
Analysis
After T steps, let m_i^T be the number of mistakes of expert i and be the number of mistakes our algorithm has made. Then we have the following bound for every i:
- Failed to parse (syntax error): {\displaystyle M^T≤2(1+η) m_i^T+(2 lnn)/η.}
In particular, this holds for i which is the best expert. That is, having the least .
Multiplicative Weights Algorithm [2][5]
Initialization: Fix an η≤1/2. For each expert, associate the weight Failed to parse (syntax error): {\displaystyle w_i^1≔1} For t=1,2,…,T:
1. Choose decision with probability proportional to its weight . I.e., use the distribution over decision where . 2. Observe the cost of the decision . 3. Penalize the costly decision by updating their weights as follows: for every decision , set
- Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "http://localhost:6011/en.wikipedia.org/v1/":): {\displaystyle w_i^{t+1} = w_i^t∗ (1- /eta ∗ m_i^t)}
Analysis
Similar to the analysis for weighted majority algorithm, assume that all costs m_i^t∈[-1,1] and η≤1/2. Then the Multiplicative Weights Algorithm guarantees that after T rounds, for any decision i, we have ∑_(t=1)^T▒m^t ∙p^t≤∑_(t=1)^T▒m_i^t +η∑_(t=1)^T▒〖|m_i^t 〗|+lnn/η.
Randomized Weighted Majority Algorithm [6]
Given a situation where the probabilities of the experts predicting positive and negative, counting the weights, are both close to 1/2. The predictions made by the algorithm would be randomized. The algorithm calculates the probabilities of experts predicting positive or negatives, and then makes a random decision based on the computed fraction: predict y ̂={█(1,& with probability q_1/W@0,&otherwise)┤ where W=∑_i▒〖w_i=q_0+q_1 〗.
Analysis
The number of mistakes made by the Randomized Weighted Majority Algorithm is bounded as: E[(⋕mistakes of the learner)]≤ α_β∙(⋕mistakes of the best expert)+c_β∙lnN where α_β=ln〖1/β〗/(1-β) and c_β=1/(1-β).
Applications[3]
Multiplicative Weights method is usually used to solve a constrained optimization problem. Let each expert be the constraint in the problem, and the events represent the points in the area of interest. The punishment of the expert corresponds to how well its corresponding constraint is satisfied on the point represented by an event.
Solving Zero-Sum Games Approximately (Oracle Algorithm): [2][3][7]
Suppose we were given the distribution P on experts. Let A = payoff matrix of a finite 2-player zero-sum game, with n rows. When the row player p_r uses plan i and the column player p_c uses plan j, the payoff of player p_c is A (i,j)≔A_ij, assuming A (i,j)∈ [0,1]. If player p_r chooses action i from a distribution P over the rows, then the expected result for player p_c selecting action j is A (P,j)=E_(i∈P) [A (i,j)]. Hence, in order to maximize A (P,j), player p_c is should choose plan j. Similarly, the expected payoff for player p_l is A (i,P)=E_(j∈P) [A (i,j)]. Choosing plan i would minimize this payoff. By John Von Neumann’s Min-Max Theorem, we obtain: 〖min〗_(P ) 〖max〗_j A(P,j)=〖max〗_(Q ) 〖min〗_i A (i,Q) where P and i changes over the distributions over rows, Q and j changes over the columns. Then, let λ^* denote the common value of above quantities, also named as the “value of the game”. Let δ>0 be an error parameter. To solve the zero-sum game bounded by additive error of δ, λ^*-δ≤〖min〗_i A (i,q ̃ ) 〖max〗_j A( p ̃,j) 〖≤λ〗^*+δ
Machine Learning
In Machine Learning, Littlestone and Warmuth generalized the Winnow algorithm to the Weighted Majority algorithm [8]. Later, Freund and Schapire generalized it in the form of Hedge algorithm [9]. AdaBoost Algorithm formulated by Yoav Freund and Robert Schapire also employed the Multiplicative Weight Update Method. [2]
Winnow Algorithm [2]
Based on current knowledge in algorithms, multiplicative weight update method was first used in Littlestone’s Winnow Algorithm. It is utilized in machine learning to solve a linear program. Given m labeled examples (a_(1,) l_1 ),…,(a_(m,) l_m ) where a_j∈R^n are feature vectors, and l_j∈{-1,1} are their labels. The aim is to find non-negative weights such that for all examples, the sign of the weighted combination of the features matches its labels. That is, require that l_j a_j∙x≥0 for all j. Without loss of generality, assume the total weight is 1 so that they form a distribution. Thus, for notational convenience, redefine a_j to be l_j a_j, the problem reduces to finding a solution to the following LP: ∀ j=1,2,…,m: a_j∙x≥0, 1∙x=1, ∀ i:x_i≥0. Initialization: Fix an η≤12. For each expert, associate the weight¬ wi1≔1 For t=1,2,…,T:
1. Pick the distribution pjt=w1tΦt where Φt=iwit. 2. Observe the cost of the decision mt. 3. Set
wit+1=wit∙exp-ϵ∙mit. This is general form of LP.
Hedge Algorithm [1]
Hedge Algorithm is similar to Weighted Majority Algorithm. However, their exponential update rules are different.
Analysis
Assume ϵ≤1 and for t∈[T], p ⃑^t is picked by Hedge. Then for all experts i, ∑_(t≤T)▒〖p ⃑^t∙m ⃑^t 〗≤∑_(t≤T)▒m ⃑^t +lnN/ϵ+ϵT.
Initialization: Fix an η≤1/2. For each expert, associate the weight¬ w_i^1≔1 For t=1,2,…,T:
1. Pick the distribution p_j^t=(w_1^t)/Φ^t where Φ^t=∑_i▒w_i^t . 2. Observe the cost of the decision m^t. 3. Set
w_i^(t+1)=w_i^t∙exp(-ϵ∙m_i^t ).
AdaBoost Algorithm [10]
Input: Sequence of N labeled examples <(x_1,y_1 ),…,(x_N,y_N)> Distribution D over the N examples Weak learning algorithm WeakLearn Integer T specifying number of iterations Initialize the weight vector: w_i^1=D(i) for i=1,…,N Do for t=1,2,…,T Set p^t=w^t/(∑_(i=1)^N▒w_i^t ) Call WeakLearn, providing it with the distribution p^t; get back a hypothesis h_t:X→[0,1]. Calculate the error of h_t:ε_t=∑_(i=1)^N▒p_i^t |h_t (x_i )-y_i |. Set β_t=ε_t/(1-ε_t ). Set the new weights vector to be w_i^(t+1)=w_i^t β_t^(1-|h_t (x_i )-y_i |) Output the hypothesis h_f (x)={█(1 if ∑_(t=1)^T▒〖(log〖1/β_t)h_t (x)≥1/2 ∑_(t=1)^T▒log〖1/β_t 〗 〗 〗 @0 otherwise. )┤
Solving Linear Programs Approximately
Problem
Given a m×n matrix A and bϵR^n, is there a x such that Ax≥b? ∃?x: Ax≥b (1)
Assumption
Using the oracle algorithm in solving zero-sum problem, with an error parameter ϵ>0, the output would either be a point x such that Ax≥b-ϵ or a proof that x does not exist, i.e., there is no solution to this linear system of inequalities.
solution
Given vector p∈∆_n, solves the following relaxed problem ∃?x: p Ax≥p b (2) If there exists a x satisfying (1), then x satisfies (2) for all p∈∆_n. The contrapositive of this statement is also true. Suppose if oracle returns a feasible solution for a p, the solution x it returns has bounded width 〖max〗_i |(Ax)_i-b_i |≤1. Therefore, if there is a solution to (1), then there is an algorithm that its output x satisfies the system (2) up to an additive error of 2ε. The algorithm makes at most lnm/ϵ^2 calls to a width-bounded oracle for the problem (2). The contrapositive stands true as well. The Multiplicative Updates is applied in the algorithm in this case.
Other Applications
Operation Research and On-line Statistical Decision Making [2]
In operation research and on-line statistical decision making problem field, the weighted majority algorithm and its more complicated versions have been found independently.
Computational Geometry [2]
Multiplicative weights algorithm is also widely applied in computational geometry such as Clarkson’s algorithm for linear programming (LP) with a bounded number of variables in linear time [12, 13]. Later, Bronnimann and Goodrich employed analogous methods to find Set Covers for hypergraphs with small VC dimension [14].
Gradient Descent Method [3]
Matrix Multiplicative Weights Update [3]
Plotkin, Shmoys, Tardos Framework for Packing/Covering LPs [2]
Approximating Multi-commodity Flow Problems [2]
O (logn)- Approximation for many NP-hard problems [2]
Learning Theory and Boosting [2]
Hard-Core Sets and the XOR Lemma [2]
Hannan’s Algorithm and Multiplicative weights [2]
Online Convex Optimization [2]
References
- ^ "Efficient Algorithms Using The Multiplicative Weights Update Method" (PDF). November 2007.
- ^ {{cite web|url=https://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15859-f11/www/notes/lecture16.pdf |format=PDF |title=The Multiplicative Weights Algorithm* |accessdate=2016-11-9}
- ^ "The Multiplicative Weights Algorithm*" (PDF). Apple.com. October 2012. Retrieved 2016-10-21.
- ^ "A sublinear-time randomized approximation algorithm for matrix games". 1995.
{{cite web}}
:|access-date=
requires|url=
(help);|format=
requires|url=
(help); Check date values in:|access-date=
(help); Missing or empty|url=
(help)