Jump to content

Analysis of parallel algorithms

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Qwertyus (talk | contribs) at 12:23, 26 January 2015 (alternative definition of span + its importance + minor fixes). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

This article discusses the analysis of parallel algorithms running on parallel random-access machines (PRAMs): models of parallel computation where multiple processors have shared access to a random-access memory (RAM).

Overview

Suppose computations are executed on a PRAM machine using p processors. Let Tp denote the time that expires between the start of the computation and its end. Analysis of the computation's running time focuses on the following notions:

  • The work of a computation executed by p processors is the total number of primitive operations that the processors perform.[1] Ignoring communication overhead from synchronizing the processors, this is equal to the time used to run the computation on a single processor, denoted T1.
  • The span is the length of the longest series of operations that have to be performed sequentially due to data dependencies (the critical path). Minimizing the span is a important activity in designing parallel algorithms, because the span determines the shortest possible execution time.[2] Alternatively, the span can be defined as the time T spent computing using an idealized machine with an infinite number of processors.[3]
  • The cost of the computation is the quantity pTp. This expresses the total time spent, by all processors, in both computing and waiting.[1]

Several useful results follow from the definitions of work, span and cost:

  • Work law. The cost is always at least the work: pTpT1. This follows from the fact p processors can perform at most p operations in parallel.[3][1]
  • Span law. A finite number p of processors cannot outperform an infinite number, so that TpT.[3]

Using these definitions and laws, the following measures of performance can be given:

  • Speedup is the gain in speed made by parallel execution compared to sequential execution: Sp = T1Tp. When the speedup is Ω(n) for input size n (using big O notation), the speedup is linear, which is optimal in the PRAM model because the work law implies that T1Tpp (super-linear speedup can occur in practice due to memory hierarchy effects). The situation TpT1 = p is called perfect linear speedup.[3] An algorithm that exhibits linear speedup is said to be scalable.[1]
  • Efficiency is the speedup per processor, Spp.[1]
  • Parallelism is the ratio T1T. It represents the maximum possible speedup on any number of processors. By the span law, the parallelism bounds the speedup: if p > T1T, then T1TpT1T < p.[3]
  • The slackness is T1 ∕ (pT). A slackness less than one implies (by the span law) that perfect linear speedup is impossible on p processors.[3]

Execution on a limited number of processors

Any computation that can run in parallel on N processors can be executed on p < N processors by letting each processor execute multiple units of work. A result called Brent's law states that one can perform such a "simulation" in time[4]

or, less precisely,[1]

References

  1. ^ a b c d e f Casanova, Henri; Legrand, Arnaud; Robert, Yves (2008). Parallel Algorithms. CRC Press. p. 10.
  2. ^ Michael McCool; James Reinders; Arch Robison (2013). Structured Parallel Programming: Patterns for Efficient Computation. Elsevier. pp. 4–5.
  3. ^ a b c d e f Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2009) [1990]. Introduction to Algorithms (3rd ed.). MIT Press and McGraw-Hill. pp. 779–784. ISBN 0-262-03384-4.
  4. ^ Gustafson, John L. (2011). "Brent's Theorem". Encyclopedia of Parallel Computing. pp. 182–185.