Jump to content

User:Wvbailey/Euclid's algorithm

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Wvbailey (talk | contribs) at 17:44, 28 January 2011 (A inelegant program for Euclid's algorithm). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

A sandbox page:

Algorithms for computers: an example

In computer systems, an algorithm is basically an instance of logic written in software by software developers to be effective for the intended "target" computer(s), in order for the software on the target machines to do something -- to produce output.

Algorithm versus function computed by an algorithm: For a given target computer multiple algorithms may exist Rogers observes that "It is . . . important to distinguish between the notion of algorithm, i.e. procedure and the notion of function computable by algorithm, i.e. mapping yielded by procedure. The same function may have several different algorithms"[1].

Algorithms for computers (and computors): A computer (or human "computor"[2]) is a restricted type of machine, a "discrete determinisitic mechanical device" [3] that follows blindly its "sequence of instructions"[4]. Melzak's and Lambek's primitive models[5] reduced this notion to four elements: (i) discrete, distinguishable locations, (ii) discrete, indistinguishable counters in the locations, and (iii) a capable agent with (iv) a list of instructions.

Knuth advises the reader that "the best way to learn an algorithm is to try it . . . immediately take pen and paper and work through an example"[6]. But what about a simulation or the real thing? The programmer must translate the algorithm into a language that the simulator/computer(computor) can effectivelyexecute[7]. Because the possibility of translation errors are high the programmer should use structured programming techniques that restrict the program "structures" to a tiny subset: the three Böhm-Jacopini canonical structures[8] DO-THEN, IF-THEN-ELSE, and WHILE-DO perhaps augmented with DO-WHILE and CASE[9]. An additional benefit will be a program that lends itself to proofs of correctness using mathematical induction [10].

Location-identifiers:

  • A location is symbolized by upper case letter(s), e.g. S, A%, etc.
  • The varying quantity in a location, its counters (as opposed to the name of location itself), will be symbolized by lower case letter(s) associated with the location's name. For example, location L at the start might contain l = 3009 counters. TWithout confusion the word "counters" can be dropped, so one can say that location L contains the number l which at the start is 3009. As the program executes, this number will change from 3009 to 2125, 1241, 884, 527, 357, 170 and 17.

Instructions: Besides INPUT, OUTPUT, and HALT only 3 types are required:

  • Conditional control instruction: IF [the test goes here] yields "true" THEN [GOTO instruction] yyy [else if test yields "false" continue in numerical sequence -- this part is assumed and not written].
  • An unconditional GOTO [instruction xxx]
  • Assignment (replacement"[11]: Symbolized by a box with the := symbol (Knuth uses ← ); on the left is the destination-location, on the right is the source location(s) with or without a formula. Examples:
S := R replaces the contents s in location S with the contents r in location R.
R := L - S subtacts the number s in location S from the number l in L and replaces the contents of R with the number l - s. Alternately, if the destination location is L (a process known as read-modify-write) then symbolically L := L – S.

Canonical flowchart symbols are made of the basic rectangle, the diamond and the dot (OR-tie):

  • IF-THEN with ELSE to the next instruction in sequence. Diamond with arrow entering top and arrows leaving at left or right vertex and from on bottom
  • DO-THEN: A sequence of assignment-rectangles descending the page.
  • Unconditional GOTO: not recommended in structured programming but is embedded in the WHILE-DO: rectangle with single arrow exiting bottom but going to instruction specified.
  • WHILE-DO: IF-THEN followed by DO-THEN ending in a GOTO that returns its arrow to the top of the diamond and OR-ties with the arrow entering it.

Euclid’s algorithm

Euclid’s algorithm appears as Proposition II in Book VII ("Elementary Number Theory") of his Elements [12]. Euclid poses the problem: "Given two numbers not prime to one another, to find their greatest common measure". Euclid defines "A number [to be] a multitude composed of units": a counting number, a positive integer not including 0. And to "measure" is to place a shorter measuring length s successively (q times) along longer length l until the remaining portion r is less than the shorter length s[13]. In modern words, remainder r = l - q*s, q being the quotient[14].

For Euclid’s method to succeed, the starting lengths must satisfy three requirements: (i) the lengths must not be 0, AND (ii) The subtraction must be “proper”, a test must guarantee that the smaller of the two numbers is subtracted from the larger (alternately, the two can be equal so their subtraction yields 0), AND (iii) the two lengths are not prime to one another.

For Euclid, the first requirement was tacit and not discussed: a length of 0 would not meet the definition of “a number”, i.e. a multitude of units. However, Nichomachus’ example [15] hints at the problem of a remainder equal to “zero”, in fact the equivalent is what terminates his algorithm. Heath 1908:300 reports (quoting Nicomachus) “ ‘. . . but 7 cannot be subtracted from 7.’ The last phrase is curious, but the meaning of it is obvious, as also the meaning of the phrase about ending 'at one and the same number' ".

With regards to the second requirement, Euclid begins his proof with the possibility that at the outset the two lengths are the same. Indeed if this is the case then the method ends. The computer algorithm will have to account for this possibility.

With regards to the third requirement, Euclid stipulated this so that he could construct a reductio ad absurdum proof that the two numbers' common measure is in fact the greatest. While Nicomachus' algorithm is the same as Euclid's, when the numbers are prime to one another it yields the number "1" for their common measure. So to be precise the following is really Nicomachus' algorithm.

A inelegant program for Euclid's algorithm

The reason why the following is declared "inelegant" is described in the following section.

The following algorithm is framed as Knuth's 4-step version of Euclid's and Nichomachus', but rather than using division to find the remainder it uses successive subtractions of the shorter length s from the remaining length r until r is less than s. The bold-face headings are adapted from Knuth 1973:2-4:

1 [Into two locations L and S put the numbers l and s that represent the two lengths]: INPUT L, S

E0: [Insure rs.]

2 [Initialize R: make the remaining length r in R equal to the starting/initial/input length l] R := L
3 [Insure the smaller of the two numbers is in S, the larger in R]: IF R > S THEN the contents of L is the larger number: skip over the exchange-steps 4, 5 and 6: GOTO step 6 ELSE continue,
4 [The contents of R (and L) are less than or equal to the contents of S: Swap the contents of R and S.] L := R (this first step is redundant, but will be useful for later discussion).
5 R := S ( s is larger so it goes into R)
6 S := L ( l, and also r is smaller so it goes into S)

E1:[Find remainder]: Until the remaining length r in R is less than the shorter length s in S, repeatedly subtract the measuring number s in S from the remaining length r in R.

7 [Test the measuring-number s in S for greater than the remaining portion r in R.]: IF S > R THEN done measuring: go to step 12 ELSE measure again: continue,
8 [Measure again: Subtract smaller number s in location S from remaining portion r in R, and deposit the difference back in R, i.e. rnew := rold - s]: R := R - S
9 [Remainder-loop]: Go to step 7.

E2: [Is the remainder 0?]: After the inner-loop test terminates it sends the program forward to test whether or not the algorithm is done: EITHER (i) the last measure was exact and the remainder in R is 0 program can halt, OR (ii) the algorithm must continue: the last measure left a remainder in R less than measuring number in S.

10 IF R = 0 then done: go to step 15, else continue,

E3.: [Interchange]: The nut of Euclid's algorithm. Use the remainder r to measure what was previously the smaller number s: L serves as a temporary location for the exchange: R-->L, S --> R, L --> S; then go back to E1.

11 L := R
12 R := S
13 S := L
14 [Repeat the measuring process]: Go to step 7
15 [Done, the contents of S, is the greatest common divisor]: PRINT S
16 HALT, END, STOP.

Summary of the inelegant program:

 1 INPUT L, S
 2 R := L

E0: [Ensure measured length ''r'' is longer than measuring length ''s'']
 3 IF R > S THEN GOTO 7
 4 L := R
 5 R := S
 6 S := L

E1: [Find remainder.]
 7 IF S > R THEN GOTO 10
 8 R := R - S
 9 GOTO 7
  
E2: [Is remainder 0?]
 10 IF R = 0 THEN GOTO 15

E3: [Interchange.]
 11 L := R
 12 R := S
 13 S := L
 14 GOTO 7

 15 PRINT S
 16  HALT, END, STOP.

An elegant program

The notion of "simplicty and elegance" appears informally in Knuth 1973:7 and precisely in Chaitin 2005:32:

"I'll show that you can't prove that a program is 'elegant,' by which I mean that it's the smallest ppssible program for producing the output that it does."

The following version of Euclid's algorithm requires only 6 core instructions to do what 13 are required to do in the inelegant version; worse, the "inelegant" program requires more types of instructions:

   10 REM Euclid's algorithm for greatest common divisor
   20 PRINT "Type two integers greater than 0"
   30 INPUT A%,B%
   40 IF B%=0 THEN GOTO 100
   50 IF A% > B% THEN GOTO 80
   60 LET B%=B%-A%
   70 GOTO 40
   80 LET A%=A%-B%
   90 GOTO 40
  100 PRINT A%
  110 END

How "elegant" works: In place of an outer "Euclid loop", "elegant" shifts back and forth between two "co-loops", an A% > B% loop that computes A% := A% - B%, and a B% ≤ A% loop that computes B% := B% - A%. This works because, when at last the minuend M is less than or equal to the subtrahend S ( Difference = Minuend - Subtrahend), the minuend can become s (the new measuring length) and the subtrahend can become the new l (the length to be measured); in other words the "sense" of the subtraction reverses.

Testing the algorithms

Does an algorithm do what its author wants it to do? The exceptional cases must be identified. Will the above algorithms perform properly when B > A, A > B, A = B? (Yes to all). What happens when A = 0?, B = 0? Both A and B = 0? (No: Inelegant computes forever in all cases; elegant computes forever when A% = 0. A fully-effective algorithm will have to solve these cases).

As suggested by Knuth one can proceed with paper and pencil to do a few test cases. One source[16] uses 3009 and 884. Another interesting case is the two relatively-prime numbers 14157 and 5950.

Proof of program correctness by use of mathematical induction: Knuth demonstrates the application of Mathematical induction to an "extended" version of Euclid's algorithm, and he proposes "a general method applicable to proving the validity of any algorithm" [17]. Tausworthe proposes that a measure of the complexity of a program be the length of its correctness proof [18].

When it comes to simulating the algorithm on a machine, can it be so general that the mechancial "platform" does not matter? In this regard, Philosopher Daniel Dennett identifies a key feature of an algorithm: what he calls its "substrate neutrality":

"Substrate neutrality: an algorithm relies on its logical structure. Thus, the particular form in which an algorithm is manifested is not important"[19].

Van Emde Boas, on the other hand, observes "even if we base complexity theory on abstract instead of concrete machines, arbitrariness of the choice of a model remains. It is at this point that the notion of simulation enters"[20]. But what model should be used for the simulation? When speed is being measured, the matters. For example, "inelegant" will execute much faster if division is available.

Good algorithms

Knuth states that:

"In practice we not only want algorithms, we want good algorithms. One criterion of goodness is the length of time taken to perform the algorithm . . .." (p. 7)

Knuth discusses the notion of fixing one parameter such as l (or a) and letting s (or b) range over all the positive integers, then take an average; he gives an example how to do this for one particular instruction (his E1')[21].

But a global count of all the steps inclusive the first to the HALT will do for illustration. For example, if "inelegant" is given the numbers 3009 and 884, it produces "17" as the g.c.d. and halts in 89 steps. "Elegant" requires only 71 steps. If the numbers 14157 and 5950 are input, inelegant halts at 371 steps while elegant halts at 321 steps.

Can the inelegant algorithm be improved?

Once the author of the algorithm judges "fit" and "effective" -- that is, it does what its author wants it to do -- then the question becomes, can it be improved?

Improvement may involve an acceptance of a tradeoff. Examination of where time is lost points to the loops, in particular an inner loop. If one were to combine the two-step sequence R := L - S followed by L := R, into a single L := L - S, then "inelegant's" execution-time would be shortened. (This requires other changes, see "less-elegant" below.

Given the two numbers 3009, 884, the count is "less-inelegant" 73, "elegant" 71; "less-elegant" looses. But given 14157, 5950 the results are "less-inelegant" 301, "elegant" 321; "elegant" looses. Which is better? Knuth's method of averaging over a large number of trials offers a possible measure of their goodness.

 A less-inelegant program:
 E0. [Ensure l > s] and E1. [Find remainder.]
   1 INPUT L, S
   2 IF S > L THEN GOTO 6
   3 L := L - S
   4 GOTO 2  
 E2. [L will contains the remainder. Is remainder 0?
   5 IF L = 0 THEN GOTO 10
 E3. [Interchange]
   6 R := L
   7 L := S
   8 S := R
   9 GOTO 3
 
   10 PRINT S
   11 HALT, END, STOP  
  1. ^ Rogers 1987:1-2
  2. ^ In his essay "Calculations by Man and Machine: Conceptual Analysis" Seig 2002:390 credits this distinction to Robin Gandy, cf Wilfred Seig, et. al., 2002 Reflections on the foundations of mathematics: Essays in honor of Solomon Feferman, Association for Symbolic Logic, A. K Peter,s Ltd, Natick, MA.
  3. ^ cf Gandy 1980:126, Robin Gandy Church's Thesis and Principles for Mechanisms appearing on pp.123-148 in J. Barwise et. al. 1980 The Kleene Symposium, North-Holland Publishing Company.
  4. ^ Or a "robot": "A computer is a robot that will perform any task that can be described as a sequence of instructions" cf Stone 1972:3
  5. ^ Lambek’s "abacus" is a "countably infinite number of locations (holes, wires etc.) together with an unlimited supply of counters (pebbles, beads, etc). The locations are distinguishable, the counters are not". The holes will have unlimited capacity, and standing by is an agent who understands and is able to carry out the list of instructions" Lambek 1961:295. Lambek references Melzak who defines his Q-machine as "an indefinitely large number of locations . . . an indefinitely large supply of counters distributed among these locations, a program, and an operator whose sole purpose is to carry out the program." (Melzak 1961:283) B-B-J (loc. cit.) add the stipulation that the holes are "capable of holding any number of stones" (p. 46). Both Melzak and Lambek appear in The Canadian Mathematical Bulletin, vol. 4, no. 3, September 1961.
  6. ^ Knuth 1973:4
  7. ^ Stone observes that "We say that an instruction is effective if there is a procedure that the robot can follow in order to determine precisely how to obey the instruction" Stone 1972:6
  8. ^ Tausworthe 1977:101
  9. ^ Tausworthe 1977:142
  10. ^ Knuth 1973 section 1.2.1, expanded by Tausworthe 1977 at pages 100ff and Chapter 9.1
  11. ^ Knuth 1973:3 "The all important replacement operation (sometimes called assignment or substitution)"
  12. ^ Heath 1908:300; Hawking’s Dover 2005 edition derives from Heath.
  13. ^ " 'Let CD, measuring BF, leave FA less than itself.' This is a neat abbreviation for saying, measure along BA successive lengths equal to CD until a point F is reached such that the length FA remaining is less than CD; in other words, let BF be the largest exact multiple of CD contained in BA" (Heath 1908:297
  14. ^ For modern treatments using division in the algorithm see Hardy and Wright 1979:180, Knuth 1973:2.
  15. ^ Nichomachus’ example and Heath's discussion of it appears in Heath 1908:300 and Hawking 2005:67.
  16. ^ http://aleph0.clarku.edu/~djoyce/java/elements/bookVII/propVII2.html
  17. ^ Knuth 1973:13-18. He credits "the formulation of algorithm-proving in terms of asertions and induction" to R. W. Floyd, Peter Naur, C. A. R. Hoare, H. H. Goldstine and J. von Neumann. Tausworth 1977 borrows Knuth's Eulid example and extends Knuth's method in section 9.1 Formal Proofs (pages 288-298).
  18. ^ cf p. 294 in Robert C. Tausworthe 1977 Standardized Development of Computer Software: Part I Methods, Prentice-Hall, Inc., Englewood Cliffs, NJ, ISBN: 0-13-842195-1
  19. ^ cf page 51 in Daniel Dennet xxx Darwin's Dangerous Idea
  20. ^ df page 875 in Peter van Emde Boas' "Machine Models and Simulation" in Jan van Leeuwen ed., 1990, "Handbook of Theoretical Computer Science. Volume A: algoritms and Compexity", The MIT Press/Elsevier, ISBN: 0-444-88071-2 (Volume A).
  21. ^ See Knuth 1973:7. His particular algorithm's instruction E1 yields a surprising number 12*ln 2*π2.