Jump to content

Randomized algorithm

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Arvindn (talk | contribs) at 18:15, 29 February 2004. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

A randomized algorithm is an algorithm which is allowed to flip a truly random coin. In practice, this means that the machine implementing the algorithm has access to a pseudo-random number generator.

As a motivating example, consider the problem of finding an 'a' in an array of n elements, given that half are 'a's and the other half are 'b's. The obvious approach is to look at each element of the array, but this would take very long (n/2 operations) if the array were ordered as 'a's first followed by 'b's. There is a similar drawback with checking in the reverse order, or checking every second element. In fact, with any strategy at all in which the order in which the elements will be checked is fixed, i.e, a deterministic algorithm, we cannot guarantee that the algorithm will complete quickly for all possible inputs. On the other hand, if we were to check array elements at random, then we will quickly find an 'a' with high probability, whatever be the input.

Thus, for randomized algorithms, it is the "average case" running time that we are typically interested in, rather than the "worst case" behavior. The worst case is assumed to be so unlikely to occur that it can be neglected.

In the example above, the randomized algorithm always outputs the correct answer, it is just that there is a small probability of taking long to execute. Sometimes we want the algorithm to always complete quickly, but allow a small probability of error. The former type are called Las Vegas algorithms, and the latter are Monte Carlo algorithms. Observe that any Las Vegas algorithm can be converted into a Monte Carlo algorithm, by having it output an arbitrary, possibly incorrect answer if it fails to complete within a specified time.

Computational complexity theory, the theory of efficiency of algorithms, studies Monte Carlo algorithms. The most basic randomized complexity class is RP, which is the class of decision problems for which there is an efficient (polynomial time) algorithm which recognizes NO-instances with absolute certainty and recognizes YES-instances with a probability of at least 1/2. The complement class is Co-RP, and the class of problems for which both YES and NO answers are allowed to be probabilistic is BPP.

Quicksort is probably the most familiar "real-world" algorithm in which randomness is very useful. The deterministic version of this algorithm requires O(n^2) time to sort n numbers for some inputs -- such as the array elements being already in sorted order! On the other hand, randomly permuting the array elements before beginning the algorithm ensures that the algorithm completes in O(n log n) time. The difference between the two is crucial when sorting large datasets.

Historically, the study of randomized algorithms was spurred by the discovery by Miller and Rabin in the early 80s that the problem of determining the primality of a number can be solved efficiently by a randomized algorithm. At that time, no deterministic algorithm for primality was known.

In practice, there is no penalty associated with accepting a small probability of error, since with a little care the probability of error can be made astronomically small. Indeed, even though a deterministic polynomial-time primality test is now known, it is has not replaced the older probabilistic tests in cryptograhic software not is it expected to in the foreseeable future.