Random coordinate descent

Random Coordinate Descent Method is optimization technique popularized in last years. It can be seen as a generalization of Coordinate descent Method with one modification: Coorinate directions are chosen randomly. Many authors reported that randomization improve Rate of convergence (time or number of iterations needed to get solution), but there was no theoretical analysis. The first proper analysis of this methods come from Yurii Nesterov (2010)^[1]. In that paper, author proposed a randomized coordinate descent method when first order information is used.

Algorithm

Imagine that one wants to solve problem $\min _{x}\{f(x)\},$ where $f$ is a Smooth function. Nesterov showed that if gradient of $f$ is a coordinate-wise lipschitz continuous with constants $L_{i}$ (i.e. $|\nabla f_{i}(x)-\nabla f_{i}(x+he_{i})|\leq L_{i}|h|$ ) than the following algorithm converge to optimal value.

Algorithm Random Coordinate Descent Method
  Input:  $x_{0}\in R^{n}$  //starting point
  Output:  $x$ 
  set x=x_0
  for k=1,... do
     choose random coordinate  $i\in \{1,2,\dots ,n\}$ 
     update  $x^{(i)}=x^{(i)}-{\frac {1}{L_{i}}}\nabla f_{i}(x)$  
  endfor;

"←" denotes assignment. For instance, "largest ← item" means that the value of largest changes to the value of item.
"return" terminates the algorithm and outputs the following value.

Convergence rate

One need to realize that this algorithm is random and hence also output after $k$ iterations is a random variable. It was shown in ^[2] that if $k\geq {\frac {2nR_{L}(x_{0})}{\epsilon }}\log \left({\frac {f(x_{0})-f^{*}}{\epsilon \rho }}\right)$ , where $R_{L}(x)=\max _{y}\max _{x^{*}\in X^{*}}\{\|y-x^{*}\|_{L}:f(y)\leq f(x)\}$ , $f^{*}$ is an optimal solution ( $f^{*}=\min _{x}\{f(x)\}$ ), $\rho \in (0,1)$ is our confidence level and $\epsilon >0$ is our target accuracy, then $Prob(f(x_{k})-f^{*}>\epsilon )\leq \rho$ .

Exaxmple on particulat function

In next Figure we show how $x_{k}$ develops during iterations. The problem was

f(x)={\tfrac {1}{2}}x^{T}\left({\begin{array}{cc}1&0.5\\0.5&1\end{array}}\right)x-\left({\begin{array}{cc}1.5&1.5\end{array}}\right)x,\quad x_{0}=\left({\begin{array}{cc}0&0\end{array}}\right)^{T}

Extension to Block Coordinate Setting

Blocking coordinate directions into Block coordinate directions

One can naturaly extend this algorithm not only just to coordinates, but to blocks of coordinates. Assume that we have space $R^{5}$ . This space has 5 coorinate directions, concretely $e_{1}=(1,0,0,0,0)^{T},e_{2}=(0,1,0,0,0)^{T},e_{3}=(0,0,1,0,0)^{T},e_{4}=(0,0,0,1,0)^{T},e_{5}=(0,0,0,0,1)^{T}$ in which Random Coordinate Descent Method can move. Hovewer, one can group some coordinate directions into blocks and we can have istead of those 5 coordinate directions 3 block coordinate directions (see image).

References

^ Nesterov, Yurii (2010), "Efficiency of coordinate descent methods on huge-scale optimization problems", CORE Discussion Paper, no. \#2010/2
^ Richtárik, Peter; Takáč, Martin (2011), Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function {{citation}}: line feed character in |title= at position 60 (help)

[1] Nesterov, Yurii (2010), "Efficiency of coordinate descent methods on huge-scale optimization problems", CORE Discussion Paper, no. \#2010/2

[2] Richtárik, Peter; Takáč, Martin (2011), Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function {{citation}}: line feed character in |title= at position 60 (help)

[1]

[2]

Algorithm

Convergence rate

Exaxmple on particulat function

Extension to Block Coordinate Setting

See also

References