HPlogo HP C/HP-UX Programmer's Guide: HP 9000 Computers > Chapter 8 Threads and Parallel Processing

Getting Started with Parallelizing C Programs

» 

Technical documentation

Complete book in PDF

 » Table of Contents

Here are some basis tasks to help you get started with parallelizing C programs.

Transforming Loops for Parallel Execution (+Oparallel)

The +Oparallel option causes the compiler to transform eligible loops for parallel execution on multiprocessor machines.

The following command lines compile (without linking) three source files: x.c, y.c, and z.c. The files x.c and y.c are compiled for parallel execution. The file z.c is compiled for serial execution, even though its object file will be linked with x.o and y.o.

cc +O3 +Oparallel -c x.c y.c cc +O3 -c z.c

The following command line links the three object files, producing the executable file para_prog:

cc +O3 +Oparallel -o para_prog x.o y.o z.o

As this command line implies, if you link and compile separately, you must use cc, not ld. The command line to link must also include the +Oparallel and +O3 options in order to link in the right startup files and runtime support.

Setting the Number of Threads Used in Parallel

Use the MP_NUMBER_OF_THREADS environment variable to set the number of processors that are to execute your program in parallel. If you do not set this variable, it defaults to the number of processors on the executing machine.

From the C shell, the following command sets MP_NUMBER_OF_THREADS to indicate that programs compiled for parallel execution can execute on two processors:

setenv MP_NUMBER_OF_THREADS 2

If you use the Korn shell, the command is:

export MP_NUMBER_OF_THREADS=2

Determining Idle Thread States

Use the MP_IDLE_THREADS_WAIT environment variable to determine how threads wait. Idle threads can be suspended or can spin-wait.

This variable takes an integer value n. For n less than 0, the threads spin-wait. For n equal to or greater than 0, the threads spin-wait for n milliseconds before being suspended.

By default, idle threads spin-wait briefly after creation or a join. They then suspend themselves if they receive no work.

Accessing the Pthreads Library

Pthreads (POSIX threads) refers to the Pthreads library of thread-management routines. For information on Pthread routines see the pthread(3t) man page.

To use the Pthread routines, your program must include the <pthreads.h> header file and the Pthreads library must be explicitly linked to your program. For example:

% cc -D_POSIX_C_SOURCE+199506L prog.c -lpthread

The -D_POSIX_C_SOURCE=199506L string specifies the appropriate POSIX revision level. In this case, the level is 199506L.

Profiling Parallelized Programs

Profiling a program that has been compiled for parallel execution is performed in much the same way as it is for non-parallel programs:

  1. Compile the program with the option -G.

  2. Run the program to produce profiling data.

  3. Run gprof against the program.

  4. View the output from gprof.

The differences are:

  • Running the program in Step 2 produces a gmon.out file for the master process and gmon.out.1, gmon.out.2, and so on for each of the slave processes. If your program executes on two processors, Step 2 produces two files, gmon.out and gmon.out.1.

  • The flat profile that you view in Step 4 indicates loops that were parallelized with the following notation:

    routine_name##pr_line_0123

    where routine_name is the name of the routine containing the loop, pr (parallel region) indicates that the loop was parallelized, and 0123 is the line number of the beginning of the loop or loops that are parallelized.

© Hewlett-Packard Development Company, L.P.