User:Wvbailey/Euclid's algorithm

A sandbox page:

Algorithms for computers: an example

In computer systems, an algorithm is basically an instance of logic written in software by software developers to be effective for the intended "target" computer(s), in order for the software on the target machines to produce output from given input (perhaps null).

Algorithm versus function computed by an algorithm: For a given target computer multiple algorithms may exist. Rogers observes that "It is . . . important to distinguish between the notion of algorithm, i.e. procedure and the notion of function computable by algorithm, i.e. mapping yielded by procedure. The same function may have several different algorithms"^[1].

Computers (and computors), models of computation: A computer (or human "computor"^[2]) is a restricted type of machine, a "discrete determinisitic mechanical device"^[3] that follows blindly its instructions^[4]. Melzak's and Lambek's primitive models^[5] reduced this notion to four elements: (i) discrete, distinguishable locations, (ii) discrete, indistinguishable counters; if no confusion will result, the word "counters" can be dropped, and a location can be said to contain a single "number", (iii) an agent, and (iv) a list of instructions that are effective relative to the capability of the agent^[6].

"Elegant" programs, "good" (fast) programs ' The notion of "simplicity and elegance" appears informally in Knuth and precisely in Chaitin:

". . .we want good algorithms in some loosely defined aesthetic sense. One criterion . . . is the length of time taken to perform the algorithm . . .. Other criteria are adaptability of the algorithm to computers, its simplicty and elegance, etc" (Knuth 1973:7)

"I'll show that you can't prove that a program is 'elegant,' by which I mean that it's the smallest possible program for producing the output that it does" (Chaitin 2005:32).

Unfortunately there may be a tradeoff between goodness and elegance. An example of this will be shown below.

Simulation of an algorithm: computer(computor) language: Knuth advises the reader that "the best way to learn an algorithm is to try it . . . immediately take pen and paper and work through an example"^[7]. But what about a simulation or execution of the real thing? The programmer must translate the algorithm into a language that the simulator/computer/computor can effectively execute. Stone gives an example of this: when computing the roots of a quadratic equation the computor must know how to take a square root. If they don't then for the algorithm to be effective, relative to the capabilities of the computor, it must provide a set of rules for extracting a square root^[8].

This means that the programmer must know an "language" effective relative to a target computing agent.

Computer(computor) language for Euclid's algorithm: Only a few instruction types are required to execute Euclid's algorithm -- assignment (replacement), GOTO, subtraction, and some logical tests.

A location is symbolized by upper case letter(s), e.g. S, A%, etc.
The varying quantity (number) in a location will be written in lower case letter(s) and (usually) associated with the location's name. For example, location L at the start might contain the number l = 3009.

Instructions: Besides INPUT, OUTPUT, and HALT only 3 types are required:

Logical test (conditional GOTO): IF [the test goes here] yields "true" THEN [the goto-instruction number goes here] ELSE continue in numerical sequence -- the ELSE part will be assumed and not written. Example: IF R > S THEN GOTO
Unconditional GOTO: GOTO [the instruction number goes here.
Assignment (replacement, substitution)"^[9]: a rectange with the := symbol; on the left is the destination, on the right is the source location(s) with or without a formula. Examples:

S := R replaces the contents s in location S with the contents r in location R.

R := L - S subtacts the number s in location S from the number l in L and replaces the contents of R with the number l - s. Alternately, if the destination location is L (a process known as read-modify-write) then symbolically L := L – S.

Structured programming, canonical structures: Kemeny and Kurtz observe that while "undisciplined" use of GOTOs and IF-THENS can result in "spaghetti code" a programmer can write structured programs using these instructions; on the other hand "it is also possible, and not too hard, to write badly structured programs in a structured language." In particular they mention the DO-LOOP (DO-WHILE) and the IF-THEN-ELSE strutures^[10] These are two of the three Böhm-Jacopini canonical structures, the other being the simple DO-THEN sequence^[11] DO-THEN, IF-THEN-ELSE, and WHILE-DO perhaps augmented with DO-WHILE and CASE^[12]. An additional benefit will be a program that lends itself to proofs of correctness using mathematical induction^[13].

Canonical flowchart symbols^[14]: These include the basic rectangle (DO-THEN), the diamond (IF-THEN-ELSE), and the dot (OR-tie) with "directed lines" between them. Flowcharts always start at the top of a page and proceed down. "Nesting" of sub-structures in a superstructure is permitted only if a single exit occurs from the superstructure. The symbols:

IF-THEN-ELSE. Diamond with arrow entering top and arrows leaving at left or right vertex and from on bottom
DO-THEN: An assignment-rectangle with an entry and an exit.

Unconditional GOTO: not recommended in structured programming but is embedded in the WHILE-DO: rectangle with single arrow exiting bottom but going to instruction specified.

WHILE-DO: IF-THEN followed by DO-THEN ending in a GOTO that returns its arrow to the top of the diamond and OR-ties with the arrow entering it.

Euclid’s algorithm

Euclid’s algorithm appears as Proposition II in Book VII ("Elementary Number Theory") of his Elements ^[15]. Euclid poses the problem: "Given two numbers not prime to one another, to find their greatest common measure". He defines "A number [to be] a multitude composed of units": a counting number, a positive integer not including 0. And to "measure" is to place a shorter measuring length s successively (q times) along longer length l until the remaining portion r is less than the shorter length s^[16]. In modern words, remainder r = l - q*s, q being the quotient, or remainder r is the "modulus", the integer-fractional part left over after the division^[17].

For Euclid’s method to succeed, the starting lengths must satisfy three requirements: (i) the lengths must not be 0, AND (ii) The subtraction must be “proper”, a test must guarantee that the smaller of the two numbers is subtracted from the larger (alternately, the two can be equal so their subtraction yields 0), AND (iii) the two lengths are not prime to one another.

For Euclid, the first requirement was tacit and not discussed: a length of 0 would not meet the definition of “a number”, i.e. a multitude of units. However, Nichomachus’ example ^[18] hints at the problem of a remainder equal to “zero”, in fact the equivalent is what terminates his algorithm. Heath 1908:300 reports (quoting Nicomachus) “ ‘. . . but 7 cannot be subtracted from 7.’ The last phrase is curious, but the meaning of it is obvious, as also the meaning of the phrase about ending 'at one and the same number' ".

With regards to the second requirement, Euclid begins his proof with the possibility that at the outset the two lengths are the same. Indeed if this is the case then the method ends. The computer algorithm will have to account for this possibility.

With regards to the third requirement, Euclid stipulated this so that he could construct a reductio ad absurdum proof that the two numbers' common measure is in fact the greatest. While Nicomachus' algorithm is the same as Euclid's, when the numbers are prime to one another it yields the number "1" for their common measure. So to be precise the following is really Nicomachus' algorithm.

A inelegant program for Euclid's algorithm

The following algorithm is framed as Knuth's 4-step version of Euclid's and Nichomachus', but rather than using division to find the remainder it uses successive subtractions of the shorter length s from the remaining length r until r is less than s. The bold-face headings are adapted from Knuth 1973:2-4:

1 [Into two locations L and S put the numbers l and s that represent the two lengths]: INPUT L, S

E0: [Insure r ≥ s.]

2 [Initialize R: make the remaining length r equal to the starting/initial/input length l] R := L

3 [Insure the smaller of the two numbers is in S and the larger in R]: IF R > S THEN the contents of L is the larger number: skip over the exchange-steps 4, 5 and 6: GOTO step 6 ELSE continue,

4 [The contents of R (and L) are less than or equal to the contents of S: Swap the contents of R and S.] L := R (this first step is redundant, but will be useful for later discussion).

5 R := S ( s is larger so it goes into R)

6 S := L ( l, and also r is smaller so it goes into S)

E1:[Find remainder]: Until the remaining length r in R is less than the shorter length s in S, repeatedly subtract the measuring number s in S from the remaining length r in R.

7 [Test the measuring-number s in S for greater than the remaining portion r in R.]: IF S > R THEN done measuring: go to step 12 ELSE measure again: continue,

8 [Measure again: Subtract smaller number s in location S from remaining portion r in R, and deposit the difference back in R, i.e. r_new := r_old - s]: R := R - S

9 [Remainder-loop]: Go to step 7.

E2: [Is the remainder 0?]: After the inner-loop test terminates it sends the program forward to test whether or not the algorithm is done: EITHER (i) the last measure was exact and the remainder in R is 0 program can halt, OR (ii) the algorithm must continue: the last measure left a remainder in R less than measuring number in S.

10 IF R = 0 then done: go to step 15, else continue,

E3.: [Interchange]: The nut of Euclid's algorithm. Use the remainder r to measure what was previously the smaller number s: L serves as a temporary location for the exchange: R-->L, S --> R, L --> S; then go back to E1.

11 L := R

12 R := S

13 S := L

14 [Repeat the measuring process]: Go to step 7

15 [Done, the contents of S, is the greatest common divisor]: PRINT S

16 HALT, END, STOP.

An elegant program

The following version of Euclid's algorithm requires only 6 core instructions to do what 13 are required to do in the inelegant version; worse, "Inelegant" requires more types of instructions. The flowchart of "Elegant" can be found at the top of this article, abbreviated somewhat from the "old Basic":

   10 REM Euclid's algorithm for greatest common divisor
   20 PRINT "Type two integers greater than 0"
   30 INPUT A%,B%
   40 IF B%=0 THEN GOTO 100
   50 IF A% > B% THEN GOTO 80
   60 LET B%=B%-A%
   70 GOTO 40
   80 LET A%=A%-B%
   90 GOTO 40
  100 PRINT A%
  110 END

How "elegant" works: In place of an outer "Euclid loop", "Elegant" shifts back and forth between two "co-loops", an A% > B% loop that computes A% := A% - B%, and a B% ≤ A% loop that computes B% := B% - A%. This works because, when at last the minuend M is less than or equal to the subtrahend S ( Difference = Minuend - Subtrahend), the minuend can become s (the new measuring length) and the subtrahend can become the new l (the length to be measured); in other words the "sense" of the subtraction reverses.

Testing the algorithms

Does an algorithm do what its author wants it to do? The exceptional cases must be identified. Will the above algorithms perform properly when B > A, A > B, A = B? (Yes to all). What happens when A = 0?, B = 0? Both A and B = 0? (No: Inelegant computes forever in all cases; elegant computes forever when A% = 0. A fully-effective algorithm will have to solve these cases).

As suggested by Knuth one can proceed with paper and pencil to do a few test cases. One source^[19] uses 3009 and 884. Another interesting case is the two relatively-prime numbers 14157 and 5950.

Proof of program correctness by use of mathematical induction: Knuth demonstrates the application of Mathematical induction to an "extended" version of Euclid's algorithm, and he proposes "a general method applicable to proving the validity of any algorithm" ^[20]. Tausworthe proposes that a measure of the complexity of a program be the length of its correctness proof ^[21].

When it comes to simulating the algorithm on a machine, can it be so general that the mechancial "platform" does not matter? In this regard, Philosopher Daniel Dennett identifies a key feature of an algorithm: what he calls its "substrate neutrality":

"Substrate neutrality: an algorithm relies on its logical structure. Thus, the particular form in which an algorithm is manifested is not important"^[22].

Van Emde Boas, on the other hand, observes "even if we base complexity theory on abstract instead of concrete machines, arbitrariness of the choice of a model remains. It is at this point that the notion of simulation enters"^[23]. But what model should be used for the simulation? When speed is being measured, the matters. For example, "inelegant" will execute much faster if division is available.

Can the inelegant algorithm be improved?

Once the author of the algorithm judges "fit" and "effective" -- that is, it does what its author wants it to do -- then the question becomes, can it be improved?

Improvement may involve an acceptance of a tradeoff. Examination of where time is lost points to the loops, in particular an inner loop. If one were to combine the two-step sequence R := L - S followed by L := R, into a single L := L - S, then "inelegant's" execution-time would be shortened. (This requires other changes, see "less-elegant" below.

Given the two numbers 3009, 884, the count is "less-inelegant" 73, "elegant" 71; "less-elegant" looses. But given 14157, 5950 the results are "less-inelegant" 301, "elegant" 321; "elegant" looses. Which is better? Knuth's method of averaging over a large number of trials offers a possible measure of their goodness.

^ Rogers 1987:1-2
^ In his essay "Calculations by Man and Machine: Conceptual Analysis" Seig 2002:390 credits this distinction to Robin Gandy, cf Wilfred Seig, et. al., 2002 Reflections on the foundations of mathematics: Essays in honor of Solomon Feferman, Association for Symbolic Logic, A. K Peter,s Ltd, Natick, MA.
^ cf Gandy 1980:126, Robin Gandy Church's Thesis and Principles for Mechanisms appearing on pp.123-148 in J. Barwise et. al. 1980 The Kleene Symposium, North-Holland Publishing Company.
^ A "robot": "A computer is a robot that will perform any task that can be described as a sequence of instructions" cf Stone 1972:3
^ Lambek’s "abacus" is a "countably infinite number of locations (holes, wires etc.) together with an unlimited supply of counters (pebbles, beads, etc). The locations are distinguishable, the counters are not". The holes will have unlimited capacity, and standing by is an agent who understands and is able to carry out the list of instructions" Lambek 1961:295. Lambek references Melzak who defines his Q-machine as "an indefinitely large number of locations . . . an indefinitely large supply of counters distributed among these locations, a program, and an operator whose sole purpose is to carry out the program." (Melzak 1961:283) B-B-J (loc. cit.) add the stipulation that the holes are "capable of holding any number of stones" (p. 46). Both Melzak and Lambek appear in The Canadian Mathematical Bulletin, vol. 4, no. 3, September 1961.
^ "We say that an instruction is effective if there is a procedure that the robot can follow in order to determine precisely how to obey the instruction" Stone 1972:6
^ Knuth 1973:4
^ Stone 1972:5. Methods for extracting roots are not trivial: see Methods of computing square roots
^ Knuth 1973:3 "The all important replacement operation (sometimes called assignment or substitution" (Knuth uses ← to symbolize this operation)"
^ John G. Kemeney and Thomas E. Kurthz 1985 Back to Basic: The History, Corruption, and Future of the Language, Addison-Wesley Publishing Company, Inc. Reading, MA, ISBN 0-201-13433-0.
^ Tausworthe 1977:101
^ Tausworthe 1977:142
^ Knuth 1973 section 1.2.1, expanded by Tausworthe 1977 at pages 100ff and Chapter 9.1
^ cf Tausworthe 1977
^ Heath 1908:300; Hawking’s Dover 2005 edition derives from Heath.
^ " 'Let CD, measuring BF, leave FA less than itself.' This is a neat abbreviation for saying, measure along BA successive lengths equal to CD until a point F is reached such that the length FA remaining is less than CD; in other words, let BF be the largest exact multiple of CD contained in BA" (Heath 1908:297
^ For modern treatments using division in the algorithm see Hardy and Wright 1979:180, Knuth 1973:2 (Volume 1), plus more discussion of Euclid's algorithm in Knuth 1969:293-297 (Volume 2).
^ Nichomachus’ example and Heath's discussion of it appears in Heath 1908:300 and Hawking 2005:67.
^ http://aleph0.clarku.edu/~djoyce/java/elements/bookVII/propVII2.html
^ Knuth 1973:13-18. He credits "the formulation of algorithm-proving in terms of asertions and induction" to R. W. Floyd, Peter Naur, C. A. R. Hoare, H. H. Goldstine and J. von Neumann. Tausworth 1977 borrows Knuth's Eulid example and extends Knuth's method in section 9.1 Formal Proofs (pages 288-298).
^ cf p. 294 in Robert C. Tausworthe 1977 Standardized Development of Computer Software: Part I Methods, Prentice-Hall, Inc., Englewood Cliffs, NJ, ISBN: 0-13-842195-1
^ cf page 51 in Daniel Dennet xxx Darwin's Dangerous Idea
^ df page 875 in Peter van Emde Boas' "Machine Models and Simulation" in Jan van Leeuwen ed., 1990, "Handbook of Theoretical Computer Science. Volume A: algoritms and Compexity", The MIT Press/Elsevier, ISBN: 0-444-88071-2 (Volume A).

[1] Rogers 1987:1-2

[2] In his essay "Calculations by Man and Machine: Conceptual Analysis" Seig 2002:390 credits this distinction to Robin Gandy, cf Wilfred Seig, et. al., 2002 Reflections on the foundations of mathematics: Essays in honor of Solomon Feferman, Association for Symbolic Logic, A. K Peter,s Ltd, Natick, MA.

[3] Gandy 1980:126, Robin Gandy Church's Thesis and Principles for Mechanisms appearing on pp.123-148 in J. Barwise et. al. 1980 The Kleene Symposium, North-Holland Publishing Company.

[4] A "robot": "A computer is a robot that will perform any task that can be described as a sequence of instructions" cf Stone 1972:3

[5] Lambek’s "abacus" is a "countably infinite number of locations (holes, wires etc.) together with an unlimited supply of counters (pebbles, beads, etc). The locations are distinguishable, the counters are not". The holes will have unlimited capacity, and standing by is an agent who understands and is able to carry out the list of instructions" Lambek 1961:295. Lambek references Melzak who defines his Q-machine as "an indefinitely large number of locations . . . an indefinitely large supply of counters distributed among these locations, a program, and an operator whose sole purpose is to carry out the program." (Melzak 1961:283) B-B-J (loc. cit.) add the stipulation that the holes are "capable of holding any number of stones" (p. 46). Both Melzak and Lambek appear in The Canadian Mathematical Bulletin, vol. 4, no. 3, September 1961.

[6] "We say that an instruction is effective if there is a procedure that the robot can follow in order to determine precisely how to obey the instruction" Stone 1972:6

[7] Knuth 1973:4

[8] Stone 1972:5. Methods for extracting roots are not trivial: see Methods of computing square roots

[9] Knuth 1973:3 "The all important replacement operation (sometimes called assignment or substitution" (Knuth uses ← to symbolize this operation)"

[10] John G. Kemeney and Thomas E. Kurthz 1985 Back to Basic: The History, Corruption, and Future of the Language, Addison-Wesley Publishing Company, Inc. Reading, MA, ISBN 0-201-13433-0.

[11] Tausworthe 1977:101

[12] Tausworthe 1977:142

[13] Knuth 1973 section 1.2.1, expanded by Tausworthe 1977 at pages 100ff and Chapter 9.1

[14] Tausworthe 1977

[15] Heath 1908:300; Hawking’s Dover 2005 edition derives from Heath.

[16] " 'Let CD, measuring BF, leave FA less than itself.' This is a neat abbreviation for saying, measure along BA successive lengths equal to CD until a point F is reached such that the length FA remaining is less than CD; in other words, let BF be the largest exact multiple of CD contained in BA" (Heath 1908:297

[17] For modern treatments using division in the algorithm see Hardy and Wright 1979:180, Knuth 1973:2 (Volume 1), plus more discussion of Euclid's algorithm in Knuth 1969:293-297 (Volume 2).

[18] Nichomachus’ example and Heath's discussion of it appears in Heath 1908:300 and Hawking 2005:67.

[19] ttp://aleph0.clarku.edu/~djoyce/java/elements/bookVII/propVII2.html

[20] Knuth 1973:13-18. He credits "the formulation of algorithm-proving in terms of asertions and induction" to R. W. Floyd, Peter Naur, C. A. R. Hoare, H. H. Goldstine and J. von Neumann. Tausworth 1977 borrows Knuth's Eulid example and extends Knuth's method in section 9.1 Formal Proofs (pages 288-298).

[21] . 294 in Robert C. Tausworthe 1977 Standardized Development of Computer Software: Part I Methods, Prentice-Hall, Inc., Englewood Cliffs, NJ, ISBN: 0-13-842195-1

[22] 51 in Daniel Dennet xxx Darwin's Dangerous Idea

[23] 875 in Peter van Emde Boas' "Machine Models and Simulation" in Jan van Leeuwen ed., 1990, "Handbook of Theoretical Computer Science. Volume A: algoritms and Compexity", The MIT Press/Elsevier, ISBN: 0-444-88071-2 (Volume A).

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]