Loop dependence analysis

Loop dependence analysis is a process which can be used to find dependencies within iterations of a loop with the goal of exploiting different forms of parallelism. Parallelism will allow multiple processors to work on different portions of the loop simultaneously. Through parallelism, the total execution time of a program will decrease due to sharing the processing load among multiple processors.

In compiler theory, loop dependence analysis is the task of determining whether statements within a loop body form a dependence with respect to array access and modification, induction, reduction and private variables, simplification of loop-independent code and management of conditional branches inside the loop body.

Loop dependence analysis is mostly done to find ways to do automatic parallelization by means of automatic vectorization, utilization of shared memory, and various other methods.

Description

Loop dependence analysis occur on a normalized loop of the form:

for i₁ until U₁ do
  for i₂ until U₂ do
    ...
    for i_n until U_n do
      body
    done
  done
done

where body may contain:

S1  a[f₁(i₁, ..., i_n), ..., f_m(i₁, ..., i_n)] := ...
    ...
S2  ... := a[h₁(i₁, ..., i_n), ..., h_m(i₁, ..., i_n)]

Where a is an m-dimensional array and f_n, h_n, etc. are functions mapping from all iteration indexes (i_n) to a memory access in a particular dimension of the array.

For example, in C:

for (i = 0; i < U1; i++)
  for (j = 0; j < U2; j++)
    a[i+4-j] = b[2*i-j] + i*j;

f₁ would be i+4-j, controlling the write on the first dimension of a and h₂ would be 2*i-j, controlling the read on the first dimension of b.

The scope of the problem is to find all possible dependencies between S1 and S2. To be conservative, any dependence which cannot be proven false must be assumed to be true.

Independence is shown by demonstrating that no two instances of S1 and S2 access or modify the same spot in array a. When a possible dependence is found, loop dependence analysis usually makes every attempt to characterize the relationship between dependent instances, as some optimizations may still be possible. It may also be possible to transform the loop to remove or modify the dependence.

In the course of (dis)proving such dependencies, a statement S may be decomposed according to which iteration it comes from. For instance, S[1,3,5] refers to the iteration where i1 = 1, i2 = 3 and i3 = 5. Of course, references to abstract iterations, such as S[d1+1,d2,d3], are both permitted and common.

Loops can have a lot of overhead when executed sequentially. Depending on the number of loop iterations, loops can consume a lot of the total execution time within a running program. To reduce the total execution time and make your program run faster, loop dependence analysis is used. The basic goal is to find sections of the loop that can be executed in parallel and to exploit multiprocessing over loop iterations thereby helping to decrease the total execution time of the program through allowing loops to finish faster. This process is often referred to as parallelization. In order to see how we can exploit parallelization, we have to first analyze the dependencies within individual loops. These dependencies will help determine which statements in the loop need to be completed before other statements can start, and which statements in the loop can be executed in parallel with respect to the other statements in the loop. Two general categories of dependencies that will be analyzed in the loop are data dependencies and control dependencies.

Data Dependence

Data dependencies show the relationships between the variables in the code. There are three different types of data dependencies:

True Dependence (sometimes referred to a Flow Dependence)
Anti Dependence
Output Dependence

True Dependence

A true dependence occurs when a location in memory is written to before it is read.^[1]^[2]^[3] It introduces read-after-write (RAW) hazards because the instruction that reads from the location in memory has to wait until it is written to by the previous instrucion or else the reading instruction will read the wrong value.^[2] An example of a true dependence is:

 S1: a = 5;
 S2: b = a;

In this example, there is a true dependence between S1 and S2 because variable a is first written in statement S1 and then variable a is read by statement S2. This true dependence can be represented by S1 →T S2.

A true dependence can also be seen when reading and writing between different iterations in a loop. The following example shows a true dependence between different iterations.

 for(j = 0; j < n; j++)
    S1: a[j] = a[j-1];

In this example, a true dependence exists between statement S1 in the jth iteration and S1 in the j+1th iteration. There is a true dependence because a value will be written to a[j] in one iteration and then a read occurs by a[j-1] in the next iteration. This true dependence can be represented by S1[j] →T S2[j+1].

Anti Dependence

An anti dependence occurs when a location in memory is read before that same location is written to.^[1]^[2]^[3] This introduces write-after-read (WAR) hazards because the instruction that writes the data into a memory location has to wait until that memory location has been read by the previous instruction or else the reading instruction would read the wrong value.^[2] An example of an anti dependence is:

 S1: a = b;
 S2: b = 5;

In this example, there is an anti dependence between statements S1 and S2. This is an anti dependence because variable b is first read in statement S1 and then variable b is written to in statement S2. This can be represented by S1 →A S2. An anti dependence can be seen by different iterations in a loop. The following example shows an example of this case:

 for(j = 0; j < n; j++)
    S1: b[j] = b[j+1];

In this example, there is an anti dependence between the jth iteration of S1 and the j+1th element of S1. Here, the j+1th element is read before that same element is written in the next iteration of j. This anti dependence can be represented by S1[j] →A S1[j+1].

Output Dependence

An output dependence occurs when a location in memory is written to before that same location is written to again in another statement.^[1]^[2]^[3] This introduces write-after-write(WAW) hazards because the second instruction to write the value to a memory location needs to wait until the first instruction finishes writing data to the same memory location or else when the memory location is read at a later time it will contain the wong value.^[2] An example of an output dependence is:

  S1: c = 8; 
  S2: c = 15;

In this example, there is an output dependence between statements S1 and S2. Here, the variable c is first written to in S1 and then variable c is written to again in statement S2. This output dependence can be represented by S1 →O S2. An output dependence can be seen by different iterations in a loop. The following code snippet shows an example of this case:

 for(j = 0; j < n; j++)
    S1: c[j] = j;  
    S2: c[j+1] = 5;

In this example, there is an output dependence between the jth element in S1 and the j+1th element in S2. Here, c[j+1] in statement S2 is written to in one iteration. In the next iteration, c[j] in statement S2, which is the same memory location as c[j+1] in the previous iteration, is written to again. This output dependence can be represented as S1[j] →O S2[j+1].

Control Dependence

Control dependencies must also be considered when analyzing dependencies between different statements in a loop. Control dependencies are dependencies introduced by the code or the programming algorithm itself. They control the order in which instructions occur within the execution of code.^[4] One common example is an "if" statement. "if" statements create branches in a program. The "then" portion of the "if" statement explicitly directs or controls actions to be taken.^[3]

 // Code block 1 (CORRECT)          // Code block 2 (INCORRECT)        // Code block 3 (INCORRECT)
 if(a == b) then {                  if(a == b) then {                  if(a == b) then {               
    c = "controlled";               }                                     c = "controlled";
 }                                  c = "controlled";                     d = "not controlled";
 d = "not controlled";              d = "not controlled";              }

In this example, the constraints on control flow are illustrated. Code block 1 shows the correct ordering when using an if statement in the C programming language. Code block 2 illustrates a problem where a statement that is supposed to be controlled by the if statement is no longer controls the statement. Code block 3 illustrates a problem where a statement that is not supposed to be controlled by the "if" statement has now been moved under its control. Both of these two possibilities could lead to improper program execution and must be considered when parallelizing these statements within a loop.

Loop-Carried Dependence vs. Loop Independent Dependence

Loop-carried dependencies and loop independent dependencies are determined by the relationships between statements in iterations of a loop. When a statement in one iteration of a loop depends in some way on a statement in a different iteration of the same loop, a loop-carried dependence exist.^[1]^[2]^[3] However, if a statement in one iteration of a loop depends only on a statement in the same iteration of the loop, this creates a loop independent dependence.^[1]^[2]^[3]

 // Code block 1                                       // Code block 2
 for(i = 0; i < 4; i++)                                for(i = 0; i < 4; i++)
    S1: b[i] = 8;                                           S1: b[i] = 8;
    S2: a[i] = b[i-1] + 10;                                 S2: a[i] = b[i] + 10;

In this example, code block 1 shows loop-dependent dependence between statement S2 iteration i and statement S1 iteration i+1. This is to say that statement S2 cannot proceed until statement S1 in the previous iteration finishes. Code block 2 show loop independent dependence between statements S1 and S2 in the same iteration.

Loop Carried Dependence and Iteration Space Traversal Graphs

Iteration space traversal graphs (ITG) shows the path that the code takes when traversing through the iterations of the loop.^[1] Each iteration is represented with a node. Loop carried dependence graphs (LDG) gives a visual representation of all true dependencies, anti dependencies, and output dependencies that exist between different iterations in a loop.^[1] Each iteration is represented with a node.

It is easier to show the difference between the two graphs with a nested for loop.

 for(i = 0; i < 4; i++)
    for(j = 0; j < 4; k++)
       S1: a[i][j] = a[i][j-1] * x;

In this example, there is a true dependence between the j iteration of statement S1 and the j+1th statement of S1. This can be represented as S1[i,j] →T S1[i,j+1] The iteration space traversal graph and the loop carried dependence graph is: Iteration Space Traversal Graph: Loop Carried Dependence Graph:

Iteration vectors

A specific iteration through a normalized loop is referenced through an iteration vector, which encodes the state of each iteration variable.

For a loop, an iteration vector is a member of the Cartesian product of the bounds for the loop variables. In the normalized form given previously, this space is defined to be U1 × U2 × ... × Un. Specific instances of statements may be parametrized by these iteration vectors, and they are also the domain of the array subscript functions found in the body of the loop. Of particular relevance, these vectors form a lexicographic order which corresponds with the chronological execution order.

Dependence Vectors

To classify data dependence, compilers use two important vectors: the distance vector (σ), which indicates the distance between f_n and h_n, and the direction vector (ρ), which indicates the corresponding direction, basically the sign of the distance.

The distance vector is defined as σ = (σ₁, ..., σ_k) where σ_n is σ_n = h_n - f_n

The direction vector is defined as ρ = (ρ₁, ..., ρ_k) where ρ_n is:

(<) if σ_n > 0 => [f_n < h_n]
(=) if σ_n = 0 => [f_n = h_n]
(>) if σ_n < 0 => [f_n > h_n]

A direction vector where the leftmost non = entry is not < can not exist. That would mean the sink of the dependency occurs before the source, which is not possible.

Classification

A dependence between two operations: a and b, can be classified according to the following criteria:

Operation type
- If a is a write and b is a read, this is a flow dependence
- If a is a read and b is a write, this is an anti-dependence
- If a is a write and b is a write, this is an output dependence
- If a is a read and b is a read, this is an input dependence
Chronological order
- If direction vector between a and b equals <, this is a lexically forward dependence
- If direction vector between a and b equals =, this is a self-dependence
- If direction vector between a and b equals >, this is a lexically backward dependence
Loop dependence
- If all distances (σ) are zero (same place in memory), this is loop independent
- If at least one distance is non-zero, this is a loop carried dependence

Plausibility

Some loop dependence relations can be parallelized (or vectorized) and some cannot. Each case must be analysed separately, but as a rule of thumb, the following table covers most cases:

ρ \ order	Lexically forward	Self-dependence	Lexically backward
positive (<)	plausible	plausible	plausible
zero (=)	implausible	δ^a: plausible δ^f: implausible	plausible
negative (>)	implausible	implausible	implausible

Some implausible dependences can be transformed into plausible ones, for example, by means of re-arranging the statements.

Alias detection

Inside loops, the same variable can be accessed for both read and write, at the same or different location within each iteration. Not only that, but the same region in memory can be accessed via different variables. When the same region in memory can be accessed by more than one variable, you have an alias.

Some aliases are very simple to detect:

a = b;
for (i = 0; i < MAX; ++i)
  a[i] = b[i+1];
a[MAX] = 0;

It is obvious that b is an alias to a, thus this code is actually shifting the array to the left. But this is not always so obvious, for instance the standard C library function strcpy() copies one string to another, but the caller could provide overlapped regions like this:

strcpy(x, x+1);

when the internal loop could be implemented as:

while(*src != 0) {
  *dst = *src;
  src++; dst++;
}

The dependency of src and dst is not obvious from within the function, you have to analyse every caller to make sure there isn't any. In the case of a library function, there is no way to assure it won't happen, so the compiler has to assume one way or another. If the compiler assumes there is no alias, whenever the regions overlap, you get undefined behaviour. If it assumes there is, you always get non-optimized code for every case.

Some compilers accept special keywords to work out if it can assume no alias, such as restrict.

Techniques

Several established devices and techniques exist for tackling the loop dependence problem. For determining whether a dependence exists, the GCD test and the Banerjee test are the most general tests in common use, while a variety of techniques exist for simpler cases.

References

^ ^a ^b ^c ^d ^e ^f ^g Solihin, Yan (2009). Fundamentals of parallel computer architecture : multichip and multicore systems. [United States?]: Solihin Pub. ISBN 978-0-9841630-0-7. {{cite book}}: |access-date= requires |url= (help)
^ ^a ^b ^c ^d ^e ^f ^g ^h Devan, Pradip; Kamat, R.K. (2014). "A Review - LOOP Dependence Analysis for Parallelizing Compiler". International Journal of Computer Science and Information Technologies. 5.
^ ^a ^b ^c ^d ^e ^f John, Hennessy; Patterson, David (2012). Computer Architecture A Quantitative Approach. 225 Wyman Street, Waltham, MA 02451, USA: Morgan Kaufmann Publishers. pp. 152–156. doi:10.1016/B978-0-12-383872-8.00003-3. ISBN 978-0-12-383872-8.{{cite book}}: CS1 maint: location (link)
^ Allen, J. R.; Kennedy, Ken; Porterfield, Carrie; Warren, Joe (1983-01-01). "Conversion of Control Dependence to Data Dependence". Proceedings of the 10th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages. POPL '83. New York, NY, USA: ACM: 177–189. doi:10.1145/567067.567085. ISBN 0897910907.

[Solihin_Parallel_Computer_Book-1] ^ ^a ^b ^c ^d ^e ^f ^g Solihin, Yan (2009). Fundamentals of parallel computer architecture : multichip and multicore systems. [United States?]: Solihin Pub. ISBN 978-0-9841630-0-7. {{cite book}}: |access-date= requires |url= (help)

[Loop_dependence_journal-2] ^ ^a ^b ^c ^d ^e ^f ^g ^h Devan, Pradip; Kamat, R.K. (2014). "A Review - LOOP Dependence Analysis for Parallelizing Compiler". International Journal of Computer Science and Information Technologies. 5.

[:0-3] ^ ^a ^b ^c ^d ^e ^f John, Hennessy; Patterson, David (2012). Computer Architecture A Quantitative Approach. 225 Wyman Street, Waltham, MA 02451, USA: Morgan Kaufmann Publishers. pp. 152–156. doi:10.1016/B978-0-12-383872-8.00003-3. ISBN 978-0-12-383872-8.{{cite book}}: CS1 maint: location (link)

[4] Allen, J. R.; Kennedy, Ken; Porterfield, Carrie; Warren, Joe (1983-01-01). "Conversion of Control Dependence to Data Dependence". Proceedings of the 10th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages. POPL '83. New York, NY, USA: ACM: 177–189. doi:10.1145/567067.567085. ISBN 0897910907.

[1]

[2]

[3]

[4]