Talk:Binary GCD algorithm
This is the talk page for discussing improvements to the Binary GCD algorithm article. This is not a forum for general discussion of the article's subject. |
Article policies
|
Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
Archives: 1Auto-archiving period: 2 years ![]() |
This is the talk page for discussing improvements to the Binary GCD algorithm article. This is not a forum for general discussion of the article's subject. |
Article policies
|
Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
Archives: 1Auto-archiving period: 2 years ![]() |
![]() | Computing Unassessed | |||||||||
|
ARM implementation
The ARM implementation lacks the initial u = 0 test. The code also does not appear to make good use of the conditional instruction set, so I have some optimizations to suggest: — Preceding unsigned comment added by 80.175.250.218 (talk) 17:09, 31 October 2005 (UTC)
remove_twos_loop optimization
gcd: @ Arguments arrive in registers r0 and r1 mov r3, #0 @ Initialize r3, the power of 2 in the result orr r12, r0, r1 @ r12 = (r0 | r1) tst r12, #1 @ Test least significant bit of (r0 | r1) bne gcd_loop @ If nonzero, either r0 or r1 are odd, jump into middle of next loop remove_twos_loop: mov r0, r0, lsr #1 @ Shift r0 right, dividing it by 2 mov r1, r1, lsr #1 @ Shift r1 right, dividing it by 2 add r3, r3, #1 @ Increment r3 tst r0, #1 @ Check least significant bit of r0 bne gcd_loop @ If nonzero, r0 is odd, jump into middle of next loop tst r1, #1 @ Check least significant bit of r1 beq remove_twos_loop @ If zero, r0 and r1 still even, keep looping
optimization 1
gcd: @ Arguments arrive in registers r0 and r1 mov r3, #0 @ Initialize r3, the power of 2 in the result orr r12, r0, r1 @ r12 = (r0 | r1) remove_twos_loop: tst r12, #1 @ Test least significant bit of (r0 | r1) moveq r0, r0, lsr #1 @ Shift r0 right, dividing it by 2 moveq r1, r1, lsr #1 @ Shift r1 right, dividing it by 2 moveq r12, r12, lsr #1 @ Shift r12 right, dividing it by 2 beq remove_twos_loop @ If zero, r0 and r1 still even, keep looping
optimization 2
gcd: @ Arguments arrive in registers r0 and r1 mov r3, #0 @ Initialize r3, the power of 2 in the result remove_twos_loop: tst r0, #1 @ Test least significant bit of r0 tsteq r1, #1 @ Test least significant bit of r1 moveq r0, r0, lsr #1 @ Shift r0 right, dividing it by 2 moveq r1, r1, lsr #1 @ Shift r1 right, dividing it by 2 beq remove_twos_loop @ If zero, r0 and r1 still even, keep looping
— Preceding unsigned comment added by 80.175.250.218 (talk) 17:09, 31 October 2005 (UTC)
Is that page obsolete ?
I ran 10 000 000 random tests on Mac OS X.6.4 with C Code and got the following results:
- Euclide : 6180 ms
- Stein : 9038 ms
and compiling with option -O3
- Euclide : 4728 ms
- Stein : 5302 ms
#include <stdio.h>
#include <sys/time.h>
#include <stdlib.h>
#define N 10000000
#define MAX_UINTplus1 0x100000000LLU
#define uint64 unsigned long long
#define INIT_RANDOM srandom(3)
uint64 pgcdEuclide(uint64,uint64);
uint64 pgcdStein(uint64,uint64);
int main(void) {
int i;
uint64 a,b,d,t0,t1;
struct timeval tv;
// Version Euclide
INIT_RANDOM;
gettimeofday(&tv,0);
t0 = tv.tv_sec* 1000 *1000+tv.tv_usec;
for(i=0;i<N;i++){
a=random()*MAX_UINTplus1+random();
b=random()*MAX_UINTplus1+random();
d=pgcdEuclide(a,b);
}
gettimeofday(&tv,0);
t1 = tv.tv_sec*1000*1000+tv.tv_usec;
printf("%llu,Euclide : %llu ms\n",d,(t1-t0)/1000);
// Version Stein
INIT_RANDOM;
gettimeofday(&tv,0);
t0 = tv.tv_sec* 1000 *1000+tv.tv_usec;
for(i=0;i<N;i++){
a=(uint64)random()*MAX_UINTplus1+random();
b=(uint64)random()*MAX_UINTplus1+random();
d=pgcdStein(a,b);
}
gettimeofday(&tv,0);
t1 = tv.tv_sec*1000*1000+tv.tv_usec;
printf("%llu,Stein : %llu ms\n",d,(t1-t0)/1000);
return 0;
}
uint64 pgcdEuclide(uint64 a,uint64 b){
if(b==0) return a;
return pgcdEuclide(b,a%b);
}
uint64 pgcdStein(uint64 u,uint64 v){
int shift;
/* GCD(0,x) := x */
if (u == 0 || v == 0)
return u | v;
/* Let shift := lg K, where K is the greatest power of 2
dividing both u and v. */
for (shift = 0; ((u | v) & 1) == 0; ++shift) {
u >>= 1;
v >>= 1;
}
while ((u & 1) == 0)
u >>= 1;
/* From here on, u is always odd. */
do {
while ((v & 1) == 0) /* Loop X */
v >>= 1;
/* Now u and v are both odd, so diff(u, v) is even.
Let u = min(u, v), v = diff(u, v)/2. */
if (u < v) {
v -= u;
} else {
uint64 diff = u - v;
u = v;
v = diff;
}
v >>= 1;
} while (v != 0);
return u << shift;
}
This is terrrible because mostly the random number generator is being timed. McKay (talk) 07:56, 11 July 2012 (UTC)
RNGs are super slow. Also note that you only went up to numbers sized 2 ** 64, meaning that arithmetic operations will take constant time if you're on a 64 bit hardware/os/etc. To do an emperical comparison you should use a bigint library (eg, ints in Python) and test numbers which are >> your CPU's native ALU (or implement your own arithmetic routines so the processor is not used directly) — Preceding unsigned comment added by 131.215.176.115 (talk) 18:06, 16 March 2013 (UTC)
As McKay mentionned, you are timing your RNG so the microbenchmark is invalid. I made a more-precise benchmark using the criterion statistical tool, when evaluating Stein's algorithm for use in an integer factoring application, and found it to be ~60% faster than Euclid's, on random 64b numbers. Interestingly, this matches the theory (Stein's algorithm uses 60% fewer bit operations than Euclid's[1]) Nicoonoclaste (talk) 10:06, 3 November 2020 (UTC)
ctz version
I think the unreferenced section "Iterative version in C++ using ctz (count trailing zeros)" should also be removed. It makes vague claims of efficiency, but as far as I can tell these rely on hardware parallelism for ctz. If you look at Count trailing zeros, the typical (x86) ctz instructions have limited bit width, so the claims to improved performance suffer from the same problem as the ARM assembly example, i.e. they are not asymptotic. Someone not using his real name (talk) 14:26, 2 March 2014 (UTC)
- Totally agreed, went ahead and WP:BOLDly deleted this section. Count trailing zeros has more to do with compilers using the instruction as part of the platform optimization, rather than with a C++ implementation which may or may not end up using that instruction. — Dsimic (talk | contribs) 17:44, 2 March 2014 (UTC)
- Keep. The article confuses the notion of an algorithm with the bit-bumming aspects of programming. Using machine arithmetic types (
int
) to illustrate the algorithm is one thing, but it should not be taken so literally. The algorithm and not the programming is the important thing. The cycles in a hardware divide instruction are not interesting. The availability of an x86 instruction to count zeros is also uninteresting. GCD on arbitrary-precision numbers (bignums) is a different story; think million-digit numbers. Binary GCD can be done with shifts and adds -- operations that are linear in the number of bits. Division is not. It may be that bignum packages use binary GCD for that reason. CTZ applied to the last digit of a bignum would offer a reasonable speed optimization (and one could use a lookup table to implement CTZ). I think there's value. Glrx (talk) 23:27, 2 March 2014 (UTC)
People looking for a fast way to compute GCD, are disappointed by this article, when they find out that the C code is slower than euclidean algorithm. It would be nice to point out that:
Actually this C code may be faster than euclidean algorithm, when replacing while((v&1)==0) v>>=1;
by v>>=__builtin_ctz(v);
This makes sense even for big numbers, since time of the new version is linear in the number of bits, while time of the first version is multiplied by the number of trailing zeros.
Of course the actual reason why it is faster, is the removal of a short loop which is made twice in average, and involves a lot of mispredicted branches. But there is no need to mention it. It would just be cool to have a really fast function. — Preceding unsigned comment added by 78.196.87.18 (talk) 03:39, 12 April 2020 (UTC)
78.196.87.18, unfortunately there is no way for a C implementation to be both portable and efficient (__builtin_ctz
isn't part of the standard). That's part of why I replaced the iterative implementation with a Rust one, as it lends itself to faster and more-readable code while still being portable. Nicoonoclaste (talk) 10:15, 3 November 2020 (UTC)
External links modified
Hello fellow Wikipedians,
I have just modified 2 external links on Binary GCD algorithm. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
- Added archive https://web.archive.org/web/20061002120425/http://users.info.unicaen.fr/~brigitte/Publications/icalp8-2000.ps to http://users.info.unicaen.fr/~brigitte/Publications/icalp8-2000.ps
- Added archive https://web.archive.org/web/20110513012901/http://users.info.unicaen.fr/~brigitte/Publications/bin-gcd.ps to http://users.info.unicaen.fr/~brigitte/Publications/bin-gcd.ps
When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.
This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}}
(last update: 5 June 2024).
- If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
- If you found an error with any archives or the URLs themselves, you can fix them with this tool.
Cheers.—InternetArchiveBot (Report bug) 15:35, 20 July 2017 (UTC)
- ^ Akhavi, Ali; Vallée, Brigitte (2000), "Average Bit-Complexity of Euclidean Algorithms", Proceedings ICALP'00, Lecture Notes Computer Science 1853: 373–387, CiteSeerX: 10.1.1.42.7616, archived from the original on 2006-10-02