Jump to content

Control flow

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Boborok (talk | contribs) at 15:50, 10 May 2006 (add === Pattern matching ===). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In computer science and in computer programming, statements in pseudocode or in a program are normally obeyed (or executed) one after the other in the order in which they are written (sequential flow of control). Most programming languages have control flow statements which allow variations in this sequential order:

  • statements may only be obeyed under certain conditions (choice),
  • statements may be obeyed repeatedly (loops),
  • a group of remote statements may be obeyed (subroutines).

The use of subroutines does not normally cause any control flow problems, but see the discussions below on early return, error recovery, and labels as parameters.

At the machine-/assembly language level, it is usually the case that the only instructions available for handling choice and/or loops are goto and conditional goto (often known as variations of jump and/or branch). Compilers for high-level programming languages must translate all control-flow statements into these primitives.

Primitives

Labels

In a few programming languages (e.g. Fortran, BASIC), a label is just a whole number which appears at the beginning of a statement, e.g.

   1234 X = 3

In many programming languages, a label is an identifier.

   Success: print ("target has been found")

   <<Success>> Ada.Text_IO.Put_Line ("target has been found");

Historical note: Algol 60 allowed both whole numbers and identifiers as labels (both attached by colons to statements), but few if any implementations allowed whole numbers.

Goto

The most common form for the unconditional transfer of control is just

   goto label

Conditional transfer of control varies from language to language, e.g.

   IF test THEN label
   IF (test) GOTO label
   if test then goto label;
   if (test) goto label;

For a fuller discussion on the drawbacks of goto, see Goto.

In brief, undisciplined use of goto leads to spaghetti code which tends to be unmaintainable; see Edsger Dijkstra's comments in Go To Statement Considered Harmful. However, Donald Knuth has shown in Structured Programming with go to Statements that disciplined use of goto may be necessary to emulate missing control-flow structures (see "proposed control structures" below).

A number of authors have pointed out that using goto is often acceptable, provided that control is transferred to some later statement (forward jump) and that control is not transferred into the middle of some other structured statement. Template:Wikibooks Chapter Text Some of the control-flow statements available in high-level programming languages are effectively disguised gotos which comply with these conditions, e.g. break, continue, return as found in C/C++.

Subroutines

The terminology for subroutines varies; they may alternatively be known as routines, procedures, or sometimes methods. If they can be used in an expression and return a result, they may also be known as functions.

In the 1950's, computer memories were very small by current standards so subroutines were used primarily to reduce program size; a piece of code was written once and then used many times from various other places in the program. Nowadays, subroutines are more frequently used to help make a program more structured, e.g. by isolating some particular algorithm or hiding some particular data access method. If many programmers are working on a single program, subroutines can be used to help split up the work.

Subroutines can be made much more useful by providing them with parameters, e.g. many programming languages have a built-in square root subroutine whose parameter is the number you wish to find the square root of.

Some programming languages allow recursion, i.e. subroutines can call themselves directly or indirectly. Certain algorithms such as Quicksort and various tree-traversals are very much easier to express in recursive form than in non-recursive form.

The use of subroutines does slow a program down slightly, due to the overhead of passing parameters, calling the subroutine, entering the subroutine (may involve saving information on a stack), and returning. The actual overhead depends on both the hardware instructions available and on any software conventions which are used; excluding parameters, the overhead may range from 2 to 14 instructions, or worse. Some compilers may effectively insert the code of a subroutine inline at the point of call to remove this overhead.

In some programming languages, the only way of returning from a subroutine is by reaching the physical end of the subroutine. Other languages have a return statement. This is equivalent to a forward jump to the physical end of the subroutine and so does not complicate the control flow situation. There may be several such statements within a subroutine if required.

In most cases, a call to a subroutine is only a temporary diversion to the sequential flow of control, and so causes no problems for control flow analysis. A few languages allow labels to be passed as parameters, in which case understanding the control flow becomes very much more complicated, since you may then need to understand the subroutine to figure out what might happen.

Here is an example illustrating the use of a subroutine which returns a value, written side-by-side in BASIC and in PHP (a language similar to C).

  BASIC                     PHP
  
  10  X = 2             |   $x = 2;
  20  Y = DOUBLE(X)     |   $y = double($x);
  30  PRINT Y           |   print $y;
                        |
  FUNCTION DOUBLE(NUM)  |   function double($num) {
  NUM = NUM * 2         |     $num *= 2;
  RETURN NUM            |     return $num;
                        |   }

At execution time, the program defines a variable X and gives it a value of 2. In the next line, the program jumps down to the subroutine named DOUBLE, copying the value of X to a temporary variable called NUM (this is known as "passing" a variable). The instructions in the subroutine are then carried out (i.e. NUM is multiplied by 2), and the resulting value is returned to the point in the main program where the subroutine was called. Here another variable Y is set to that value (4) and, on the next line, output to the screen. The program will end execution after the third line without continuing into the function.

Minimal structured control flow

In May 1966, Böhm and Jacopini published an article in Communications of the ACM which showed that any program with gotos could be transformed into a goto-free form involving only choice (IF THEN ELSE) and loops (WHILE condition DO xxx), possibly with duplicated code and/or the addition of Boolean variables (true/false flags). Later authors have shown that choice can be replaced by loops (and yet more Boolean variables).

The fact that such minimalism is possible does not necessarily mean that it is desirable; after all, computers theoretically only need one machine instruction (subtract one number from another and branch if the result is negative), but practical computers have dozens or even hundreds of machine instructions.

What Böhm and Jacopini's article showed was that all programs could be goto-free. Other research showed that control structures with one entry and one exit were much easier to understand than any other form, primarily because they could be used anywhere as a statement without disrupting the control flow.

Control structures in practice

Template:Wikibookschapter Most programming languages with control structures have an initial keyword which indicates the type of control structure involved. Languages then divide as to whether or not control structures have a final keyword.

  • No final keyword: Algol 60, C, C++, Haskell, Java, Pascal, PL/I, Python. Such languages need some way of grouping statements together:
    • Algol 60 and Pascal : begin ... end.
    • C, C++ and Java: curly brackets { ... }.
    • PL/1: DO ... END.
    • Haskell and Python: uses indentation level (see Off-side rule).
  • Final keyword: Ada, Algol 68, Modula-2, Fortran 77, Visual Basic. The forms of the final keyword vary:
    • Ada: final keyword is end + space + initial keyword e.g. if ... end if, loop ... end loop.
    • Algol 68: initial keyword spelled backwards e.g. if ... fi, case ... esac.
    • Fortran 77: final keyword is end + initial keyword e.g. IF ... ENDIF, DO ... ENDDO.
    • Modula-2: same final keyword end for everything.
    • Visual Basic: every control structure has its own keyword. If ... End If; For ... Next; Do ... Loop.

Languages which have a final keyword tend to have less debate regarding layout and indentation (example). Languages whose final keyword is of the form: end + initial keyword (with or without space in the middle) tend to be easier to learn and read.

Choice

Structured If

The structured if distinguishes itself from the if already seen by the fact that it can span several statement and therefore won't need a goto.

All examples in this chapter are done in the Ada programming language

then part

In its simplest form the Structured If allows execution of several statements, if and only if a condition is true:

if condition then
    statements;
end if; 

else part

The else part offers a second group of statements which is executed if the condition is not true.

if condition then
    statements;
else condition then
    other statements;
end if;

else-if parts

By using else-if it is possible to combine several conditions. Only the statements following the first condition that is found to be true will be executed. All other statements will be skipped. The statements of the final else will be executed if none of the conditions are true.

if condition then
    statements;
elsif condition then
    more statements;
elsif condition then
    more statements;
...
else condition then
    other statements;
end if;

Pattern matching

Pattern matching is how high-level if-then-elseif abstractions are called in some programming languages, for example ML. Here is an example written in the O'Caml language:

match fruit with
| "apple" -> cook pie
| "coconut" -> cook dango_mochi
| "banana" -> mix;;

Choice based on specific constant values

These are usually known as case or switch statements. The effect is to compare a given value with specified constants and take action according to the first constant to match. If the constants form a compact range then this can be implemented very efficiently as if it were a choice based on whole numbers. This is often done by using a jump table.

   case someChar of                switch (someChar) {
      'a': actionOnA;                 case 'a': actionOnA;
      'x': actionOnX;                     break;
      'y','z':actionOnYandZ;          case 'x': actionOnX;
   end;                                   break;
                                      case 'y':
                                      case 'z': actionOnYandZ;
                                          break;
                                      default: actionOnNoMatch;
                                   }

In some languages and programming environments, a case or switch statement is considered easier to read and maintain than an equivalent series of if-else statements. One notable aspect of the "switch" statement is that of "fall-through", especially as used in the C programming language.

Duff's device is a loop unwinding technique that makes use of a "switch" statement.

Choice based on whole numbers 1..N

Relatively few programming languages have these constructions but it can be implemented very efficiently using a computed goto.

   GOTO (label1,label2,label3), I
   outOfRangeAction
   case I in action1, action2, action3 out outOfRangeAction esac

Arithmetic IF

Fortran 77 has an "arithmetic if" statement which is halfway between a computed IF and a case statement, based on the trichotomy , , :

   IF (e) label1, label2, label3

Where e is any numeric expression (not necessarily an integer); this is equivalent to

   IF (e < 0) GOTO label1
   IF (e = 0) GOTO label2
   IF (e > 0) GOTO label3

Because this arithmetic IF is equivalent to multiple GOTO statements, it is considered to be an unstructured control statement, and should not be used if more structured statements can be used. In practice it has been observed that most arithmetic IF statements referenced the following statement with one or two of the labels.

This specialized construct was included because the original implementation of Fortran (on the IBM 704) could implement it particularly efficiently; each of the three IF..GOTO statements above could be implemented as a single instruction, e.g Branch if accumulator negative.

Choice system cross reference

Choice system cross reference
Programming language Structured If constant choice numbered choice Arithmetic IF
then else else-if
Ada yes yes yes yes no no
C yes yes no 1 fall-thrue no no
C++ yes yes no 1 fall-thrue no no
Fortran yes yes yes yes no yes
  1. The often encoutered else if (condition) in C/C++ is not a language feature but a set of nested and independent if then else statements combined with a particular source code layout. However, this also means that else-if is not really needed in C/C++.

Loops

A loop is a sequence of statements which is specified once but which may be carried out several times in succession. The code "inside" the loop (the body of the loop, shown below as xxx) is obeyed a specified number of times, or once for each of a collection of items, or until some condition is met.

In some languages, such as Scheme, loops are often expressed using tail recursion rather than explicit looping constructs.

Count-controlled loops

Most programming languages have constructions for repeating a loop a certain number of times. Note that if N is less than 1 in these examples then the body is skipped completely. In most cases counting can go downwards instead of upwards and step sizes other than 1 can be used.

   FOR I = 1 TO N            for I := 1 to N do begin
       xxx                       xxx
   NEXT I                    end;

   DO I = 1,N                for ( I=1; I<=N; ++I ) {
       xxx                       xxx
   END DO                    }

See also For loop, Loop counter.

In many programming languages, only integers can be reliably used in a count-controlled loop. Floating-point numbers are represented imprecisely due to hardware constraints, so a loop such as

   for X := 0.1 step 0.1 to 1.0 do

might be repeated 9 or 10 times, depending on rounding errors and/or the hardware and/or the compiler version.

Condition-controlled loops

Again, most programming languages have constructions for repeating a loop until some condition changes. Note that some variations place the test at the start of the loop, while others have the test at the end of the loop. In the former case the body may be skipped completely, while in the latter case the body is always obeyed at least once.

   DO WHILE (test)           repeat 
       xxx                       xxx 
   END DO                    until test;

   while (test) {            do
       xxx                       xxx
   }                         while (test);

See also While loop.

Collection-controlled loops

A few programming languages (e.g. Smalltalk, Perl, Java, C#, Visual Basic) have special constructs which allow you to implicitly loop through all elements of an array, or all members of a set or collection.

   someCollection do: [ :eachElement | xxx ].

   foreach someArray { xxx }

   Collection<String> coll; for (String s : coll) {}

   foreach (string s in myStringCollection) { xxx }

General iteration

General iteration constructs such as C's for statement and Common Lisp's do form can be used to express any of the above sorts of loops, as well as others -- such as looping over a number of collections in parallel. Where a more specific looping construct can be used, it is usually preferred over the general iteration construct, since it often makes the purpose of the expression more clear.

Sometimes it is desirable for a program to loop forever, or until an exceptional condition such as an error arises. For instance, an event-driven program may be intended to loop forever handling events as they occur, only stopping when the process is killed by the operator.

More often, an infinite loop is due to a programming error in a condition-controlled loop, wherein the loop condition is never changed within the loop.

Continuation with next iteration

Sometimes within the body of a loop there is a desire to skip the remainder of the loop body and continue with the next iteration of the loop. Some languages provide a statement such as continue which will do this. The effect is to prematurely terminate the innermost loop body and then resume as normal with the next iteration. If the iteration is the last one in the loop, the effect is to terminate the entire loop early.

Early exit from loops

When using a count-controlled loop to search through a table, you may wish to stop searching as soon as you have found the required item. Some programming languages provide a statement such as break or exit, whose effect is to terminate the current loop immediately and transfer control to the statement immediately following that loop. Things can get a bit messy if you are searching a multi-dimensional table using nested loops (see Missing Control Structures below).

The following example is done in Ada which supports both Early exit from loops and Loop with test in the middle. Both features are very similar and comparting both code snipletts will show the difference: early exit needs to be combined with an if statement while an condition in the middle is a self contained contruct.

   with Ada.Text_IO;
   with Ada.Integer_Text_IO;

   procedure Print_Squares is 
       X : Integer;
   begin
       Read_Data : loop
           Ada.Integer_Text_IO.Get(X);
           if X = 0; then
               exit Read_Data;
           end if;
           Ada.Text_IO.Put(X * X);
           Ada.Text_IO.New_Line;
       end loop Read_Data;
   end Print_Squares;


Python supports conditional execution of code depending on whether a loop was exited early or not. For example,

   for n in set_of_numbers:
       if isprime(n):
           print "Set contains a prime number"
           break
   else:
       print "Set did not contain any prime numbers"

Both Python's for and while loops support such an else clause, which is executed only if early exit of the loop did not occur.

Loop system cross reference table

Loop cross reference
Programming language conditional loop early exit continuation
begin middle end count collection general infinite1
Ada yes yes yes yes arrays no yes deep nested no
C yes no yes no 2 no yes no one level yes
C++ yes no yes no 2 no yes no one level 4 yes
FORTRAN yes no no yes no no yes one level yes
Python yes no no yes 3 yes no no one level 4 yes
  1. while (true) does not count for infitive loop as first class language feature
  2. C/C++'s for (init; condition; loop) loop is a general and not counting loop contruct.
  3. By using the Python builtin range generator.
  4. Multilevel using exceptions.

Self-modifying code

Self-modifying code is supported by most assembly languages, and implemented by the ALTER verb in COBOL. It can be used to alter not merely the flow of control, but the statements that alter the flow of control. Self-modifying code tends to be very hard to understand and maintain, and the performance gains it once offered are no longer relevant—most modern architectures make it either impossible or very inefficient to modify running code because it invalidates caches and pipelines. For this reason, self-modifying code is no longer advocated or generally used for control flow.

Structured non-local control flow

Many programming languages, particularly those which favor more dynamic styles of programming, offer constructs for non-local control flow. These cause the flow of execution to jump out of a given context and resume at some predeclared point. Exceptions, conditions, and continuations are three common sorts of non-local control constructs.

Conditions

PL/1 has some 22 standard conditions (e.g. ZERODIVIDE SUBSCRIPTRANGE ENDFILE) which can be RAISEd and which can be intercepted by: ON condition action; Programmers can also define and use their own named conditions.

Like the unstructured ifonly one statement can be specified so in many cases a GOTO is needed to decide where flow of control should resume.

Unfortunately, some implementations had a substantial overhead in both space and time (especially SUBSCRIPTRANGE), so many programmers tried to avoid using conditions.

Common Syntax examples:

 ON condition GOTO label

Exceptions

Modern languages have a structured construct for exception handling which does not rely on the use of GOTO:

   try {
       xxx1                                        // Somewhere in here
       xxx2                                        //     use: throw someValue;
       xxx3
   } catch (someClass & someId) {            // catch value of someClass
       actionForSomeClass 
   } catch (someType & anotherId) {          // catch value of someType
       actionForSomeType
   } catch (...) {                           // catch anything not already caught
       actionForAnythingElse
   }

Any number and variety of catch clauses can be used above. In D, Java, C#, and Python a finally clause can be added to the try construct. No matter how control leaves the try the code inside the finally clause is guaranteed to execute. This is useful when writing code that must relinquish an expensive resource (such as an opened file or a database connection) when finished processing:

   FileStream stm = null;                    // C# example
   try {
       stm = new FileStream("logfile.txt", FileMode.Create);
       return ProcessStuff(stm);                    // may throw an exception
   } finally {
       if (stm != null)
           stm.Close();
   }

Since this pattern is fairly common, C# has a special syntax that is slightly more readable:

   using (FileStream stm = new FileStream("logfile.txt", FileMode.Create)) {
       return ProcessStuff(stm);                  // may throw an exception
   }

Upon leaving the using-block, the compiler guarantees that the stm object is released.

All these languages define standard exceptions and the circumstances under which they are thrown. Users can throw exceptions of their own (in fact C++ and Python allow users to throw and catch almost any type).

If there is no catch matching a particular throw, then control percolates back through subroutine calls and/or nested blocks until a matching catch is found or until the end of the main program is reached, at which point the program is forcibly stopped with a suitable error message.

The AppleScript scripting programming language provides several pieces of information to a "try" block:

try
    set myNumber to myNumber / 0

on error e  number n  from f  to t  partial result pr

    if ( e = "Can't divide by zero" ) then display dialog "You idiot!"

end try

Non-local control flow cross reference

Non-local control flow cross reference
Programming language long-jump conditions exceptions
Ada no no yes
C yes no no
C++ yes no yes
C# no no yes
D no no yes
Java no no yes
Python, no no yes
Ruby no no yes
Objective C yes no yes
PL/1 no yes no

Proposed control structures

In a spoof Datamation article (December 1973), R. Lawrence Clark suggested that the GOTO statement could be replaced by the COMEFROM statement, and provides some entertaining examples. This was actually implemented in the INTERCAL programming language, a language designed to make programs as obscure as possible.

In his 1974 article "Structured Programming with go to Statements", Donald Knuth identified two situations which were not covered by the control structures listed above, and gave examples of control structures which could handle these situations. Despite their utility, these constructions have not yet found their way into main-stream programming languages.

Loop with test in the middle

This was proposed by Dahl in 1972.

   loop                           loop
       xxx1                           read(char);
   while test;                    while not atEndOfFile;
       xxx2                           write(char);
   repeat;                        repeat;

If xxx1 is omitted we get a loop with the test at the top. If xxx2 is omitted we get a loop with the test at the bottom. If while is omitted we get an infinite loop. Hence this single construction can replace several constructions in most programming languages. A possible variant is to allow more than one while test; within the loop, but the use of exitwhen (see next section) appears to cover this case better.

As the example on the right shows (copying a file one character at a time), there are simple situations where this is exactly the right construction to use in order to avoid duplicated code and/or repeated tests. Template:Wikibookschapter In Ada, the above loop construct (loop-while-repeat) can be represented using a standard infinite loop (loop - end loop) that has an exit when clause in the middle (not to be confused with the exitwhen statement in the following section).

   with Ada.Text_IO;
   with Ada.Integer_Text_IO;

   procedure Print_Squares is 
       X : Integer;
   begin
       Read_Data : loop
           Ada.Integer_Text_IO.Get(X);
       exit Read_Data when X = 0;
           Ada.Text_IO.Put(X * X);
           Ada.Text_IO.New_Line;
       end loop Read_Data;
   end Print_Squares;

Naming a loop (Like Read_Data in our example) is optional but allows to leave the outer loop of several nested loops.

Multiple early exit/exit from nested loops

This was proposed by Zahn in 1974. A modified version is presented here.

   exitwhen EventA or EventB or EventC;
       xxx
   exits
       EventA: actionA
       EventB: actionB
       EventC: actionC
   endexit;

exitwhen is used to specify the events which may occur within xxx, their occurrence is indicated by using the name of the event as a statement. When some event does occur, the relevant action is carried out, and then control passes just after endexit. This construction provides a very clear separation between determining that some situation applies, and the action to be taken for that situation.

exitwhen is conceptually similar to the try/catch construct in C++, but is likely to be much more efficient since there is no percolation across subroutine calls and no transfer of arbitrary values. Also, the compiler can check that all specified events do actually occur and have associated actions.

The following simple example involves searching a two-dimensional table for a particular item.

   exitwhen found or missing;
       for I := 1 to N do
           for J := 1 to M do
               if table[I,J] = target then found;
       missing;
   exits
       found:   print("item is in table");
       missing: print("item is not in table");
   endexit;

Anecdotal evidence

The following statistics apply to a 6000-line compiler written in a private language containing the above constructions.

There are 10 condition-controlled loops, of which 6 have the test at the top, 1 has the test at the bottom, and 3 have the test in the middle.

There are 18 exitwhen statements, 5 with 2 events, 11 with 3 events, and 2 with 4 events. When these were first used in the compiler, replacing various flags and tests, the number of source lines increased by 0.1%, the size of the object code decreased by 3%, and the compiler (when compiling itself) was 4% faster. Prior to the introduction of exitwhen, 4 of the condition-controlled loops had more than one while test; and 5 of the count-controlled loops also had a while test.

See also

References

  • Dahl & Dijkstra & Hoare, "Structured Programming" Academic Press, 1972.
  • Knuth, Donald E. "Structured Programming with go to Statements" ACM Computing Surveys 6(4):261-301, December 1974.
  • Böhm, Jacopini. Flow diagrams, "Turing Machines and Languages with only Two Formation Rules" Comm. ACM, 9(5):366-371, May 1966.
  • Hoare, C. A. R. "Partition: Algorithm 63," "Quicksort: Algorithm 64," and "Find: Algorithm 65." Comm. ACM 4, 321-322, 1961.
  • Zahn, C. T. "A control statement for natural top-down structured programming" presented at Symposium on Programming Languages, Paris, 1974.