Testing high-performance computing applications

Title : Testing High Performance Computing Applications

Overview

High Performance Computing Applications consists of concurrent programs designed using multi-threaded as well as multi-process models. The application may consists of various parallel constructs (threads, local processes, distributed processes, etc.) with varying degree of parallelism. Although high performance concurrent programs use similar design patterns, models and principles as that of sequential programs, unlike sequential programs, they demonstrate non-deterministic behavior. As the number of interactions between the various parallel constructs increase, the probability of introduction of bugs increases exponentially. Race conditions, data races, deadlocks, missed signals, live lock are some of the most common and frequent errors in high performance computing applications.

Challenges

Broadly speaking, parallel programs can be divided into two categories, viz. explicitly parallel programs and implicitly parallel programs. If parallel language constructs defined for process creation, communication and synchronization are used explicitly while developing a parallel program, then such application becomes an explicitly parallel application. On the other hand, if a tool or parallelizing compiler is used to convert a serial program into a parallel one, then it becomes a case of implicitly parallel program. Both these categories of high performance software are equally prone to many bugs.

Heisenbugs

It is expected that when a concurrent application is tested, it should execute correctly on every possible thread schedule in the underlying operating system, but the traditional methods of testing can only detect few bugs. This is chiefly because of Heisenbugs [2] problem. A Heisenbug is an error which changes its behavior or disappears completely when an attempt is made to isolate and probe them via debugger by adding some constructs like synchronization requests or delay statements. This issue causes the bug to go undetected and unfixed.

Non-repeatability

Another issue is caused due to the unpredictable behavior of the scheduler. The differences in the load on the system vary the behavior of the scheduler changes accordingly. Also, this behavior cannot be changed manually. Thus, to counter the indeterminism introduced by the Operating System scheduler, the program needs to be executed many times under various execution environments. Still, it is not guaranteed that the bug can be reproduced. Most of the times, program runs correctly, but the bug exists in the application and its effect is visible only when some specific conditions are matched. As a result, non-repeatability of the concurrent programs is a major source of roadblock for detecting error. As an example, consider the following.

< INSERT PROGRAMS WITH DEADLOCK PROBLEMS HERE >

Clearly, this has a problem of causing deadlocks. Yet, it may cause deadlock in some runs of the program and in some other runs, it may run successfully.

Detecting Probe Effect

Probe effect is seen in parallel programs when delay-statements are inserted in the parallel programs having synchronization problems. This effect, like Heisenbugs, alters the behavior of the high performance software. It can either cause non-functioning concurrent programs to work or properly working parallel programs may cease to function when delays are inserted removed or altered. Detecting the source of a probe effect poses a great challenge in testing parallel applications.

Testing Strategies

The differences between sequential and concurrent programs lead to the differences in their testing strategies. Existing strategies for the sequential programs can be modified to make them suitable for concurrent applications. Some new strategies have also been developed. Conventionally, testing of the program includes designing the test cases and checking if the program ‘works’ as expected. Thus, the various errors in specification, functionality, etc. are detected by actually running the application and subjecting it to testing methods like Functionality Testing, White Box, Black Box, Grey Box Testing and so on. [1] In addition to these tests, there is a trend of using static analysis for detecting the errors in high performance software. The source code of the application is analyzed using various static analysis methods like Data Flow Analysis, Control Flow Analysis, Cyclomatic Complexities, Thread Escape Analysis, and Static Slicing Analysis to find problems in parallel applications. Using static analysis before the functionality testing can save valuable time for the developer and tester. It can not only detect ‘what the error is’ but also it can find the source of the error. Here are some of the strategies to test the parallel applications.

Deterministic Scheduling / Reproducible Testing [2]

The indeterminacy of scheduling originates from two sources.

Time slicing:

Scheduler makes a context switch at equal intervals of time. This time interval depends on speed of individual processors, memory-cache hierarchy state and the load on the system. Even on the same processor, at same load conditions, the time slice interval varies slightly due to minor variations in frequency of the clock oscillator.

Page Faults

Scheduler also suspends a program if a page fault occurs and lets the other threads to proceed while the system fetches the page required for previous thread from disk into the memory. As the occurrence of page faults depends upon other processes running in the system, it makes the timing of a context switch indeterminate.

To make the concurrent programs repeatable, an external scheduler is used. The program, which is being tested, is instrumented to add calls to this new scheduler. Such calls are made at the beginning and end of each thread as well as before every synchronization request. This new scheduler, which is independent of OS scheduler, selectively blocks different threads of execution by maintaining a semaphore associated with each thread, such that only one thread is ready for execution at any given time. Thus, it converts parallel non-deterministic application into a serial execution sequence in order to achieve repeatability. The number of scheduling decisions made by the serializing scheduler is given by – (N * K / P)*{(N + P)!} Where N = number of threads K = potential context switch points P = budget of pre-emptive context switches

Feedback Directed Testing

To obtain more accurate results using deterministic scheduling policy, an alternate approach can be chosen. It is known that few pre-emptions in the concurrent program, which are properly placed, can quickly detect many bugs related to data-races. [2] It has also been proved that bugs are found in clusters. That is, if one bug is found, there is a high probability that more bugs exists in the same region of code. Thus, by using a feedback loop, data races are detected by exploring different scheduling paths, and then calls to the new scheduler are inserted at racing-program locations. Finally, by allowing the racing-locations to execute in different scheduling order, the results are analyzed for unprecedented behavior.

Timing Related Testing [5]

This strategy is employed to ensure that the application is not prone to the Probe Effect. Sources of errors causing Probe Effect can range from task creation issues to synchronization and communication problems. Requirements of timing related tests are – [5]

Delay duration must be changed
Delay points must cover necessary program locations
Delay statements must be inserted, removed or relocated to induce probe effect

This strategy defines number of test cases per input data set as – ⁿC₁ + ⁿC₁ + … + ⁿC₁ = 2ⁿ -1 Where n = total number of synchronization, process creation and communication calls.

This equation has exponential order. In order to reduce the number of test cases, either Deterministic Execution Method (DET) or Multiple Execution Technique (MET) is used. Various issues must be handled for successful timing-related testing.

Achieving delayed execution

Addition of delays is a straightforward task. A typical sleep() statement, supported by almost all the languages can be used to insert delays.

Deciding where to insert delays

The locations in the program where the delays must be inserted to induce probe effect are known as delay-points. As the objective of timing related test cases is to detect synchronization, communication and thread creation errors, the delay statements are added just in front of these statements.

Advantages

Easy to implement on multiple processors without any need of ordering the synchronization requests.
No need to generate concurrency graph
More effective for fault detection
Total number of test cases are less due to static analysis

All Du-Path Testing

This method applies the concept of define-use pair, in order to determine the paths to be tested.