SPIKE algorithm

The SPIKE algorithm is a hybrid parallel solver for narrow-banded linear systems developed by Eric Polizzi and Ahmed Sameh.

Description

The SPIKE algorithm deals with a linear system $AX = F$ , where $A$ is a banded $n\times n$ matrix of bandwidth much less than $n$ , and $F$ is an $n\times s$ matrix containing $s$ right-hand sides. It is divided into a preprocessing stage and a postprocessing stage.

Preprocessing stage

In the preprocessing stage, the linear system $AX = F$ is partitioned into a block tridiagonal form

{\begin{bmatrix}{\boldsymbol {A}}_{1}&{\boldsymbol {B}}_{1}\\{\boldsymbol {C}}_{2}&{\boldsymbol {A}}_{2}&{\boldsymbol {B}}_{2}\\&\ddots &\ddots &\ddots \\&&{\boldsymbol {C}}_{p-1}&{\boldsymbol {A}}_{p-1}&{\boldsymbol {B}}_{p-1}\\&&&{\boldsymbol {C}}_{p}&{\boldsymbol {A}}_{p}\end{bmatrix}}{\begin{bmatrix}{\boldsymbol {X}}_{1}\\{\boldsymbol {X}}_{2}\\\vdots \\{\boldsymbol {X}}_{p-1}\\{\boldsymbol {X}}_{p}\end{bmatrix}}={\begin{bmatrix}{\boldsymbol {F}}_{1}\\{\boldsymbol {F}}_{2}\\\vdots \\{\boldsymbol {F}}_{p-1}\\{\boldsymbol {F}}_{p}\end{bmatrix}}.

Assume, for the time being, that the diagonal blocks $A j$ ( $j=1,\ldots ,p$ ) are nonsingular. Define a block diagonal matrix

D = diag(A 1,\dots, A p)

,

then $D$ is also nonsingular. Left-multiplying $D -1$ to both sides of the system gives

{\begin{bmatrix}{\boldsymbol {I}}&{\boldsymbol {V}}_{1}\\{\boldsymbol {W}}_{2}&{\boldsymbol {I}}&{\boldsymbol {V}}_{2}\\&\ddots &\ddots &\ddots \\&&{\boldsymbol {W}}_{p-1}&{\boldsymbol {I}}&{\boldsymbol {V}}_{p-1}\\&&&{\boldsymbol {W}}_{p}&{\boldsymbol {I}}\end{bmatrix}}{\begin{bmatrix}{\boldsymbol {X}}_{1}\\{\boldsymbol {X}}_{2}\\\vdots \\{\boldsymbol {X}}_{p-1}\\{\boldsymbol {X}}_{p}\end{bmatrix}}={\begin{bmatrix}{\boldsymbol {G}}_{1}\\{\boldsymbol {G}}_{2}\\\vdots \\{\boldsymbol {G}}_{p-1}\\{\boldsymbol {G}}_{p}\end{bmatrix}},

which is to be solved in the postprocessing stage. Left-multiplication by $D -1$ is equivalent to solving $p$ systems of the form

A j [V j W j G j] = [B j C j F j]

(omitting $W 1$ and $C 1$ for $j=1$ , and $V p$ and $B p$ for $j=p$ ), which can be carried out in parallel.

Due to the narrow-banded nature of $A$ , only a few leftmost columns of each $V j$ and a few rightmost columns of each $W j$ can be nonzero. These columns are called the spikes.

Postprocessing stage

Without loss of generality, assume that each spike contains exactly $m$ columns ( $m$ is much less than $n$ ) (pad the spike with columns of zeroes if necessary). Partition the spikes in all $V j$ and $W j$ into

{\begin{bmatrix}{\boldsymbol {V}}_{j}^{(t)}\\{\boldsymbol {V}}_{j}'\\{\boldsymbol {V}}_{j}^{(b)}\end{bmatrix}}

and

{\begin{bmatrix}{\boldsymbol {W}}_{j}^{(t)}\\{\boldsymbol {W}}_{j}'\\{\boldsymbol {W}}_{j}^{(b)}\\\end{bmatrix}}

where $V (t) j$ , $V (b) j$ , $W (t) j$ and $W (b) j$ are of dimensions $m\times m$ . Partition similarly all $X j$ and $G j$ into

{\begin{bmatrix}{\boldsymbol {X}}_{j}^{(t)}\\{\boldsymbol {X}}_{j}'\\{\boldsymbol {X}}_{j}^{(b)}\end{bmatrix}}

and

{\begin{bmatrix}{\boldsymbol {G}}_{j}^{(t)}\\{\boldsymbol {G}}_{j}'\\{\boldsymbol {G}}_{j}^{(b)}\\\end{bmatrix}}.

Notice that the system produced by the preprocessing stage can be reduced to a block pentadiagonal system of much smaller size (recall that $m$ is much less than $n$ )

{\begin{bmatrix}{\boldsymbol {I}}_{m}&{\boldsymbol {0}}&{\boldsymbol {V}}_{1}^{(t)}\\{\boldsymbol {0}}&{\boldsymbol {I}}_{m}&{\boldsymbol {V}}_{1}^{(b)}&{\boldsymbol {0}}\\{\boldsymbol {0}}&{\boldsymbol {W}}_{2}^{(t)}&{\boldsymbol {I}}_{m}&{\boldsymbol {0}}&{\boldsymbol {V}}_{2}^{(t)}\\&{\boldsymbol {W}}_{2}^{(b)}&{\boldsymbol {0}}&{\boldsymbol {I}}_{m}&{\boldsymbol {V}}_{2}^{(b)}&{\boldsymbol {0}}\\&&\ddots &\ddots &\ddots &\ddots &\ddots \\&&&{\boldsymbol {0}}&{\boldsymbol {W}}_{p-1}^{(t)}&{\boldsymbol {I}}_{m}&{\boldsymbol {0}}&{\boldsymbol {V}}_{p-1}^{(t)}\\&&&&{\boldsymbol {W}}_{p-1}^{(b)}&{\boldsymbol {0}}&{\boldsymbol {I}}_{m}&{\boldsymbol {V}}_{p-1}^{(b)}&{\boldsymbol {0}}\\&&&&&{\boldsymbol {0}}&{\boldsymbol {W}}_{p}^{(t)}&{\boldsymbol {I}}_{m}&{\boldsymbol {0}}\\&&&&&&{\boldsymbol {W}}_{p}^{(b)}&{\boldsymbol {0}}&{\boldsymbol {I}}_{m}\end{bmatrix}}{\begin{bmatrix}{\boldsymbol {X}}_{1}^{(t)}\\{\boldsymbol {X}}_{1}^{(b)}\\{\boldsymbol {X}}_{2}^{(t)}\\{\boldsymbol {X}}_{2}^{(b)}\\\vdots \\{\boldsymbol {X}}_{p-1}^{(t)}\\{\boldsymbol {X}}_{p-1}^{(b)}\\{\boldsymbol {X}}_{p}^{(t)}\\{\boldsymbol {X}}_{p}^{(b)}\end{bmatrix}}={\begin{bmatrix}{\boldsymbol {G}}_{1}^{(t)}\\{\boldsymbol {G}}_{1}^{(b)}\\{\boldsymbol {G}}_{2}^{(t)}\\{\boldsymbol {G}}_{2}^{(b)}\\\vdots \\{\boldsymbol {G}}_{p-1}^{(t)}\\{\boldsymbol {G}}_{p-1}^{(b)}\\{\boldsymbol {G}}_{p}^{(t)}\\{\boldsymbol {G}}_{p}^{(b)}\end{bmatrix}}.

Once all $X (t) j$ and $X (b) j$ are found, all $X' j$ can be recovered with perfect parallelism via

{\begin{cases}{\boldsymbol {X}}_{1}'={\boldsymbol {G}}_{1}'-{\boldsymbol {V}}_{1}'{\boldsymbol {X}}_{2}^{(t)}{\text{,}}\\{\boldsymbol {X}}_{j}'={\boldsymbol {G}}_{j}'-{\boldsymbol {V}}_{j}'{\boldsymbol {X}}_{j+1}^{(t)}-{\boldsymbol {W}}_{j}'{\boldsymbol {X}}_{j-1}^{(b)}{\text{,}}&j=2,\ldots ,p-1{\text{,}}\\{\boldsymbol {X}}_{p}'={\boldsymbol {G}}_{p}'-{\boldsymbol {W}}_{p}{\boldsymbol {X}}_{p-1}^{(b)}{\text{.}}\end{cases}}

SPIKE as a polyalgorithm

Despite being logically divided into two stages, computationally, the SPIKE algorithm comprises three stages:

LU-factorizing the diagonal blocks,
computing the spikes,
solving the reduced system.

Each of these stages can be accomplished in several ways, allowing a multitude of variants. Two notable variants are the recursive SPIKE algorithm for non-diagonally-dominant cases and the truncated SPIKE algorithm for diagonally-dominant cases. In particular, the former uses LU factorization without pivoting but a diagonal boosting strategy to handle the cases where the block diagonal matrix $D$ is singular; it can also serve as the preconditioner for iterative schemes like Krylov subspace methods and iterative refinement.

Implementations

Intel offers an implementation of the SPIKE algorithm under the name Intel Adaptive Spike-Based Solver.^[1]

References

Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1016/j.parco.2005.07.005, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1016/j.parco.2005.07.005 instead.
Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1016/j.compfluid.2005.07.005, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1016/j.compfluid.2005.07.005 instead.
^ "Intel Adaptive Spike-Based Solver - Intel Software Network". Retrieved 2009-03-23.