HPlogo HP C/HP-UX Programmer's Guide: HP 9000 Computers > Chapter 8 Threads and Parallel Processing

Guidelines for Parallelizing C Programs

» 

Technical documentation

Complete book in PDF

 » Table of Contents

To ensure the best performance from a parallel program, do not run more than one parallel program on a multiprocessor machine at the same time. Running two or more parallel programs simultaneously or running one parallel program on a heavily loaded system, will slow performance.

You should run a parallel-executing program at a higher priority than any other user program; see rtprio(1) for information about setting real-time priorities.

Conditions Inhibiting Loop Parallelization

The following sections describe different conditions that can inhibit parallelization.

Calling Routines with Side Effects

The compiler will not parallelize any loop containing a call to a routine that has side effects. A routine has side effects if it does any of the following:

  • Modifies its arguments.

  • Modifies an extern, static, or global variable.

  • Redefines variables that are local to the calling routine.

  • Performs I/O.

  • Calls another subroutine or function that does any of the above.

Indeterminate Iteration Counts

If the compiler cannot determine what the runtime loop iteration count is before the loop executes, it does not parallelize the loop. The reason for this limitation is that the runtime code must know the iteration count in order to know how many iterations to distribute to the different processors for execution.

The following conditions can prevent a runtime count:

  • The loop is an infinite loop.

  • A conditional break statement or goto out of the loop appears in the loop.

  • The loop modifies either the loop-control or loop-limit variable.

  • The loop is a while construct and the condition being tested is defined within the loop.

Data Dependence

When a loop is parallelized, the iterations are executed independently on different processors, and the order of execution differs from the serial order that occurs on a single processor. This effect of parallelization is not a problem. The iterations could be executed in any order with no effect on the results. Consider the following loop:

for (i=0; i<5; i++)
a[i] = a[i] * b[i];

In this example, the array a would always end up with the same data regardless of whether the order of execution were 0-1-2-3-4, 4-3-2-1-0, 3-1-4-0-2, or any other order. The independence of each iteration from the others makes the loop eligible candidate for parallelization.

Such is not the case in the following:

for (i=1; i<5; i++)
a[i] = a[i-1] * b[i];

In this loop, the order of execution does matter. The data used in iteration i is dependent upon the data that was produced in the previous iteration [i-1]. a would end up with very different data if the order of execution were any other than 1-2-3-4. The data dependence in this loop thus makes it ineligible for parallelization.

Not all data dependences must inhibit parallelization. The following paragraphs discuss some of the exceptions.

Nested Loops and Matrices

Some nested loops that operate on matrices may have a data dependence in the inner loop only, allowing the outer loop to be parallelized. Consider the following:

for (i=0; i<10; i++)
for (j=1; j<100; j++)
a[i][j] = a[i][j-1] + 1;

The data dependence in this nested loop occurs in the inner [j] loop: Each row access of a[i][j] depends upon the preceding row [j-1] having been assigned in the previous iteration. If the iterations of the [j] loop were to execute in any other order than the one in which they would execute on a single processor, the matrix would be assigned different values. The inner loop, therefore, must not be parallelized.

But no such data dependence appears in the outer loop: Each column access is independent of every other column access. Consequently, the compiler can safely distribute entire columns of the matrix to execute on different processors; the data assignments will be the same regardless of the order in which the columns are executed, so long as each executes in serial order.

Assumed Dependencies

When analyzing a loop, the compiler errs on the safe side and assume that what looks like a data dependence really is one and so it does not parallelize the loop. Consider the following:

for (i=100; i<200; i++)
a[i] = a[i-k];

The compiler assumes that a data dependence exists in this loop because it appears that data that has been defined in a previous iteration is being used in a later iteration. However, if the value of k is 100, the dependence is assumed rather than real because a[i-k] is defined outside the loop.

© Hewlett-Packard Development Company, L.P.