Jump to content

Analysis of parallel algorithms

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Qwertyus (talk | contribs) at 20:17, 25 January 2015 (new stub). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

This article discusses the analysis of parallel algorithms running on parallel random-access machines (PRAMs): models of parallel computation where an unbounded number of processors have shared access to a random-access memory (RAM).

Overview

Suppose computations are executed on a PRAM machine using p processors. Let Tp denote the time that expires between the start of the computation and its end. Analysis of the computation's running time focuses on the following notions:

  • The work of a computation executed by p processors is the total number of primitive operations that the processors perform.[1] Ignoring communication overhead from synchronizing the processors, this is equal to the time used to run the computation on a single processor, denoted T1.
  • The span is the time T spent computing using an idealized machine with an infinite number of processors.[2]
  • The cost of the computation is the quantity pTp. This expresses the total time spent, by all processors, in both computing and waiting.[1]

Several useful results follow from the definitions of work, span and cost:

  • Work law. The cost is always at least the work: pTpT1. This follows from the fact p processors can perform at most p operations in parallel.[2][1]
  • Span law. A finite number p of processors cannot outperform an infinite number, so that TpT.[2]

Using these definitions and laws, the following measures of performance can be given:

  • Speedup is the gain in speed made by parallel execution compared to sequential execution: Sp = Tp / T1. When the speedup is Ω(n) for input size n (using big O notation), the speedup is linear, which is optimal in the PRAM model (super-linear speedup can occur in practice due to memory hierarchy effects). The situation Tp / T1 = p is called perfect linear speedup.[2] An algorithm that exhibits linear speedup is said to be scalable.[1]
  • Efficiency is the speedup per processor, Sp / p.[1]

References

  1. ^ a b c d e Casanova, Henri; Legrand, Arnaud; Robert, Yves (2008). Parallel Algorithms. CRC Press. p. 10.
  2. ^ a b c d Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2009) [1990]. Introduction to Algorithms (3rd ed.). MIT Press and McGraw-Hill. ISBN 0-262-03384-4.