Binary GCD algorithm

The binary GCD algorithm, also known as Stein's algorithm, is an algorithm that computes the greatest common divisor of two nonnegative integers. Stein's algorithm uses simpler arithmetic operations than the conventional Euclidean algorithm; it replaces division with arithmetic shifts, comparisons, and subtraction. Although the algorithm was first published by the Israeli physicist and programmer Josef Stein in 1967,^[1] it may have been known in 1st-century China.^[2]

Algorithm

The algorithm reduces the problem of finding the GCD by repeatedly applying these identities:

gcd(0, v) = v, because everything divides zero, and v is the largest number that divides v. Similarly, gcd(u, 0) = u. gcd(0, 0) is not typically defined, but it is convenient to set gcd(0, 0) = 0.
If u and v are both even, then gcd(u, v) = 2·gcd(u/2, v/2), because 2 is a common divisor.
If u is even and v is odd, then gcd(u, v) = gcd(u/2, v), because 2 is not a common divisor. Similarly, if u is odd and v is even, then gcd(u, v) = gcd(u, v/2).
If u and v are both odd, and u ≥ v, then gcd(u, v) = gcd((u − v)/2, v). If both are odd and u < v, then gcd(u, v) = gcd((v − u)/2, u). These are combinations of one step of the simple Euclidean algorithm, which uses subtraction at each step, and an application of step 3 above. The division by 2 results in an integer because the difference of two odd numbers is even.^[3]
Repeat steps 2–4 until u = v, or (one more step) until u = 0. In either case, the GCD is 2^kv, where k is the number of common factors of 2 found in step 2.

The algorithm requires O(n²)^[4] worst-case time, where n is the number of bits in the larger of the two numbers. Although each step reduces at least one of the operands by at least a factor of 2, the subtract and shift operations take linear time for very large integers (although they're still quite fast in practice, requiring about one operation per word of the representation).

An extended binary GCD, analogous to the extended Euclidean algorithm, is given by Knuth along with pointers to other versions.^[5]

Implementation

Recursive version in C

Following is a recursive implementation of the algorithm in C. The implementation is similar to the description of the algorithm given above. It use two arguments u and v. All but one of the recursive calls are tail recursive.

unsigned int gcd(unsigned int u, unsigned int v)
{
    // simple cases (termination)
    if (u == v)
        return u;

    if (u == 0)
        return v;

    if (v == 0)
        return u;

    // look for factors of 2
    if (~u & 1) // u is even
    {
        if (v & 1) // v is odd
            return gcd(u >> 1, v);
        else // both u and v are even
            return gcd(u >> 1, v >> 1) << 1;
    }

    if (~v & 1) // u is odd, v is even
        return gcd(u, v >> 1);

    // reduce larger argument
    if (u > v)
        return gcd((u - v) >> 1, v);

    return gcd((v - u) >> 1, u);
}

Iterative version in C

Following is an implementation of the algorithm in C, taking two (non-negative) integer arguments u and v. It first removes all common factors of 2 using identity 2, then computes the GCD of the remaining numbers using identities 3 and 4, and combines these to form the final answer.

unsigned int gcd(unsigned int u, unsigned int v)
{
  int shift;

  /* GCD(0,v) == v; GCD(u,0) == u, GCD(0,0) == 0 */
  if (u == 0) return v;
  if (v == 0) return u;
 
  /* Let shift := lg K, where K is the greatest power of 2
        dividing both u and v. */
  for (shift = 0; ((u | v) & 1) == 0; ++shift) {
         u >>= 1;
         v >>= 1;
  }
 
  while ((u & 1) == 0)
    u >>= 1;
 
  /* From here on, u is always odd. */
  do {
       /* remove all factors of 2 in v -- they are not common */
       /*   note: v is not zero, so while will terminate */
       while ((v & 1) == 0)  /* Loop X */
           v >>= 1;

       /* Now u and v are both odd. Swap if necessary so u <= v,
          then set v = v - u (which is even). For bignums, the
          swapping is just pointer movement, and the subtraction
          can be done in-place. */
       if (u > v) {
         unsigned int t = v; v = u; u = t;}  // Swap u and v.
       v = v - u;                       // Here v >= u.
     } while (v != 0);

  /* restore common factors of 2 */
  return u << shift;
}

Implementation in assembly

This version of binary GCD in ARM assembly (using GNU Assembler syntax) highlights the benefit of branch predication, showing that the advantage of binary GCD over the Euclidean algorithm lies in its optimizability for real-world machines. The loop to implement step 2 consists of three instructions, all predicated. Step 3 consists of two loops, each 2 instructions long (one of the instructions being predicated); however, after the first iteration r0 is kept odd and need not be tested, and only one of the loops is executed. (On cores that implement the clz instruction, steps 2 and 3 can be completed without looping.) Finally, step 4 takes four instructions of which 2 are predicated.

Since u and v are guaranteed even or odd at certain points, their least significant bits (LSBs) can be considered part of the program state and need not be stored in the registers. The evenness tests then become a side effect of the bit shifts, since the discarded LSB can be stored into the carry flag for testing. Thus the code works with u/2 and v/2, which are known at the start of each loop to be even or odd.

                              @ Arguments arrive in registers r0 and r1
gcd:
   subs    r3, r0, r0         @ Power-of-two counter = 0, carry flag = 1
   orrs    r2, r0, r1         @ Logical-OR r0 and r1, set flags on result
                              @ Carry flag remains set. If r0 and r1 are
                              @ both zero, this loop does nothing and the
                              @ code exits with r0 = 0.
remove_twos_loop:
   movnes  r2, r2, lsr #1     @ Shift r2 right if >0, carry flag = LSB
   addcc   r3, r3, #1         @ If the LSB was 0 then add 1 to the counter
   bcc     remove_twos_loop   @ and loop to try the next bit (terminates)
   movs    r0, r0, lsr r3     @ else divide r0 by 2^r3 and test result
   movnes  r1, r1, lsr r3     @ if r0 > 0 divide r1 by 2^r3 and test result
   beq     finish             @ if either is zero sum inputs and exit
                              @ lsl r3 at finish undoes movs above

                              @ Now the LSB of either r0 or r1 is 1,
                              @ and u and v are considered to be even.
                              @ But starting when we reach the subs below, 
                              @ u > 0; v > 0; r0 = u / 2; r1 = v / 2.
                              @ The LSBs of u and v are tested in the carry
                              @ flag, then memorized by the program state.
check_two_r0:
   movs    r0, r0, lsr #1     @ divide u by 2 by shifting r0 right
   bcc     check_two_r0       @ repeat until u is odd (loop terminates)
check_two_r1:                 @ Loop X:
   movs    r1, r1, lsr #1     @ divide v by 2 by shifting r2 right
   bcc     check_two_r1       @ repeat until v is odd (loop terminates)

                              @ u0 := 2 * r0 + 1, v0 := 2 * r1 + 1
   subs    r1, r1, r0         @ v := v0 - u0, even and possibly zero
   addcc   r0, r0, r1         @ if v0 < u0, u := u + v, i.e. u := u0 + (v0 - u0)
                              @ i.e. u := min(u0, v0), odd since both u0 and v0 were
   rsbcc   r1, r1, #0         @ if v0 < u0, v := 0 - v, i.e. v := |v0 - u0|
   bne     check_two_r1       @ if v > 0, we need to make it odd
                              @ shift bits out of r1 up to the first 1 bit.

                              @ if v = 0, the carry flag is set (from subs)
   adc     r0, r0, r0         @ Restore u; u = 2(r0) + 1
finish:
   orr     r0, r1, r0, lsl r3 @ multiply u by 2^r3 by shifting left
   bx      lr                 @ return to caller with result in r0.

Efficiency

Akhavi and Vallée proved that binary GCD can be about 60% more efficient (in terms of the number of bit operations) on average than the Euclidean algorithm.^[6]^[7]^[8] However, although this algorithm outperforms the traditional Euclidean algorithm, its asymptotic performance is the same, and it is considerably more complex thanks to the availability of division instruction in all modern microprocessors.

In addition, real computers operate on more than one bit at a time, and even assembly language binary GCD implementations have to compete against carefully designed hardware circuits for integer division. Overall, Knuth (1998) reports a 15% gain over Euclidean GCD,^[2] and according to one comparison, the greatest gain was about 60%, while on some popular architectures even good implementations of binary GCD were marginally slower than the Euclidean algorithm.^[9]

In general, with implementations of binary GCD similar to the above C code, the gain in speed over the Euclidean algorithm is always less in practice than in theory. The reason is that the code uses many data-dependent branches. Many branches may be removed by computing min(a,b) and |a-b| using mixtures of Boolean algebra and arithmetic.

The only data-dependent branch that these techniques do not remove is the loop condition marked Loop X, which can be replaced by a single count trailing zeros (CTZ) operation and shift. Depending on platform, CTZ may be performed either by a single hardware instruction, by an equivalent instruction sequence, or with the aid of a lookup table.^[9]

Iterative version in C++ using ctz (count trailing zeros)

Using a count trailing zeros (CTZ) function can improve the performance of the binary gcd algorithm. A randomly distributed number has an exponentially declining distribution of trailing zeros: 0.50 have no trailing zeros, 0.25 have 1 trailing zeros, 0.125 have 2 trailing zeros, 0.0625 have 3 trailing zeros, .... Iterations are only saved if there are two or more trailing zeros, so the expected number of saved iterations is not large. Although there are occasions where counting trailing zeros can gain a lot in one step, big gains would not be expected unless the numbers had unusual properties.

// ctz(x) counts trailing zeros in x

unsigned int gcd(unsigned int x, unsigned int y)
{
    if (x == 0)
        return y;

    if (y == 0)
        return x;

    unsigned int cf2 = ctz(x | y);

    x >>= ctz(x);

    for (;;)
    {
        y >>= ctz(y);

        if (x == y)
            break;

        if (x > y)
            std::swap(x, y);

        if (x == 1)
            break;

        y -= x;
    }

    return x << cf2;
}

Historical description

An algorithm for computing the GCD of two numbers was described in the ancient Chinese mathematics book The Nine Chapters on the Mathematical Art. The original algorithm was used to reduce a fraction. The description reads:

"If possible halve it; otherwise, take the denominator and the numerator, subtract the lesser from the greater, and do that alternately to make them the same. Reduce by the same number."

This just looks like a normal Euclidian algorithm, but the ambiguity lies in the phrase "if possible halve it". The traditional interpretation is that this only works when 'both' numbers to begin with are even, under which the algorithm is just a slightly inferior Euclidean algorithm (for using subtraction instead of division). But the phrase may as well mean dividing by 2 should 'either' of the numbers become even, in which case it is the binary GCD algorithm.

References

^ Stein, J. (1967), "Computational problems associated with Racah algebra", Journal of Computational Physics, 1 (3): 397–405, ISSN 0021-9991
^ ^a ^b Knuth, Donald (1998), Seminumerical Algorithms, The Art of Computer Programming, vol. 2 (3rd ed.), Addison-Wesley, ISBN 0-201-89684-2
^ In fact, the algorithm might be improved by the observation that if both u and v are odd, then exactly one of u + v or u−v must be divisible by four. Specifically, assuming u ≥ v, if ((u xor v) and 2) = 2, then gcd(u, v) = gcd((u + v)/4, v), and otherwise gcd(u, v) = gcd((u − v)/4, v).
^ http://gmplib.org/manual/Binary-GCD.html
^ Knuth 1998, p. 646, answer to exercise 39 of section 4.5.2
^ Akhavi, Ali; Vallée, Brigitte (2000), "AverageBit-Complexity of Euclidean Algorithms", Proceedings ICALP'00, Lecture Notes Computer Science 1853: 373–387, CiteSeerX: 10.1.1.42.7616
^ Brent, Richard P. (2000), "Twenty years' analysis of the Binary Euclidean Algorithm", Millenial Perspectives in Computer Science: Proceedings of the 1999 Oxford-Microsoft Symposium in honour of Professor Sir Antony Hoare, Palgrave, NY: 41–53 proceedings edited by J. Davies, A. W. Roscoe and J. Woodcock.
^ Notes on Programming by Alexander Stepanov
^ ^a ^b Faster implementations of binary GCD algorithm (download GCD.zip)

External links

NIST Dictionary of Algorithms and Data Structures: binary GCD algorithm
Cut-the-Knot: Binary Euclid's Algorithm at cut-the-knot
Analysis of the Binary Euclidean Algorithm (1976), a paper by Richard P. Brent, including a variant using left shifts
"Dynamics of the Binary Euclidean Algorithm: Functional Analysis and Operators" (1998), a paper by Brigitte Vallée.
Online gcd calculator(4 methods)

[1] Stein, J. (1967), "Computational problems associated with Racah algebra", Journal of Computational Physics, 1 (3): 397–405, ISSN 0021-9991

[Knuth98-2] Knuth, Donald (1998), Seminumerical Algorithms, The Art of Computer Programming, vol. 2 (3rd ed.), Addison-Wesley, ISBN 0-201-89684-2

[3] In fact, the algorithm might be improved by the observation that if both u and v are odd, then exactly one of u + v or u−v must be divisible by four. Specifically, assuming u ≥ v, if ((u xor v) and 2) = 2, then gcd(u, v) = gcd((u + v)/4, v), and otherwise gcd(u, v) = gcd((u − v)/4, v).

[gmplib-4] ttp://gmplib.org/manual/Binary-GCD.html

[5] Knuth 1998, p. 646, answer to exercise 39 of section 4.5.2

[6] Akhavi, Ali; Vallée, Brigitte (2000), "AverageBit-Complexity of Euclidean Algorithms", Proceedings ICALP'00, Lecture Notes Computer Science 1853: 373–387, CiteSeerX: 10.1.1.42.7616

[7] Brent, Richard P. (2000), "Twenty years' analysis of the Binary Euclidean Algorithm", Millenial Perspectives in Computer Science: Proceedings of the 1999 Oxford-Microsoft Symposium in honour of Professor Sir Antony Hoare, Palgrave, NY: 41–53 proceedings edited by J. Davies, A. W. Roscoe and J. Woodcock.

[Stepanov-8] Notes on Programming by Alexander Stepanov

[boost-9] Faster implementations of binary GCD algorithm (download GCD.zip)

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

v t e Number-theoretic algorithms
Primality tests	AKS APR Baillie–PSW Elliptic curve Pocklington Fermat Lucas Lucas–Lehmer Lucas–Lehmer–Riesel Proth's theorem Pépin's Quadratic Frobenius Solovay–Strassen Miller–Rabin
Prime-generating	Sieve of Atkin Sieve of Eratosthenes Sieve of Pritchard Sieve of Sundaram Wheel factorization
Integer factorization	Continued fraction (CFRAC) Dixon's Lenstra elliptic curve (ECM) Euler's Pollard's rho p − 1 p + 1 Quadratic sieve (QS) General number field sieve (GNFS) Special number field sieve (SNFS) Rational sieve Fermat's Shanks's square forms Trial division Shor's
Multiplication	Ancient Egyptian Long Karatsuba Toom–Cook Schönhage–Strassen Fürer's
Euclidean division	Binary Chunking Fourier Goldschmidt Newton-Raphson Long Short SRT
Discrete logarithm	Baby-step giant-step Pollard rho Pollard kangaroo Pohlig–Hellman Index calculus Function field sieve
Greatest common divisor	Binary Euclidean Extended Euclidean Lehmer's
Modular square root	Cipolla Pocklington's Tonelli–Shanks Berlekamp
Other algorithms	Chakravala Cornacchia Exponentiation by squaring Integer square root Integer relation (LLL; KZ) Modular exponentiation Montgomery reduction Schoof Trachtenberg system
Italics indicate that algorithm is for numbers of special forms