HPlogo HP C/HP-UX Programmer's Guide: HP 9000 Computers > Chapter 8 Threads and Parallel Processing

Parallel Processing Options

» 

Technical documentation

Complete book in PDF

 » Table of Contents

HP C provides the following optimization options for parallelizing C programs:

+O[no]autopar

Optimization level(s): 3, 4

Default: +Oautopar if +Oparallel is enabled

When used with +Oparallel, the +Onoautopar option causes the compiler to parallelize only those loops marked by the loop_parallel or prefer_parallel pragmas. Because the compiler does not automatically find parallel tasks or regions, user-specified task and region parallelization is not affected by this option.

A loop is safe to parallelize if it has an iteration count that can be determined at runtime before loop invocation, and contains no loop-carried dependences, procedure calls, or I/O operations. A loop-carried dependence exists when one iteration of a loop assigns a value to an address that is referenced or assigned on another iteration.

+O[no]dynsel

Optimization level(s): 3, 4

Default: +Odynsel if +Oparallel is enabled

When specified with +Oparallel, +Odynsel (the default) enables workload-based dynamic selection. For parallelizable loops whose iteration counts are known at compile time, +Odynsel causes the compiler to generate either a parallel or a serial version of the loop—depending on which is more profitable.

This optimization also causes the compiler to generate both parallel and serial versions of parallelizable loops whose iteration counts are unknown at compile time. At runtime, the loop workload is compared to parallelization overhead, and the parallel version is run only if it is profitable to do so.

The +Onodynsel option disables dynamic selection and tells the compiler that it is profitable to parallelize all parallelizable loops. The dynsel pragma can be used to enable dynamic selection for specific loops when +Onodynsel is in effect.

See Also: “dynsel[(trip_count=n)]”

+O[no]loop_block

Optimization level(s): 3, 4

Default: +Onoloop_block

The +O[no]loop_block option enables [disables] blocking of eligible loops for improved cache performance. The +Onoloop_block option disables automatic and directive-specified loop blocking. For more information on loop blocking, see the Parallel Programming Guide for HP-UX Systems.

+O[no]loop_unroll_jam

Optimization level(s): 3, 4

Default: +Onoloop_unroll_jam

The +O[no]loop_unroll_jam option enables [disables] loop unrolling and jamming. The +Onoloop_unroll_jam option disables both automatic and directive-specified unroll and jam. Loop unrolling and jamming increases register exploitation. For more information on the unroll and jam optimization, see the Parallel Programming Guide for HP-UX Systems.

+O[no]parallel

Optimization level(s): 3, 4

Default: +Onoparallel

The +Oparallel option optimizes the time it takes to execute a single process running on a multiprocessor system.

NOTE: If you compile one or more files in an application using +Oparallel, then the application must be linked (using the compiler driver) with the +Oparallel option to link in the proper start-up files and runtime support.

The +Oparallel option causes the compiler to:

  • Recognize the directives and pragmas that involve parallelism, such as begin_tasks, loop_parallel, and prefer_parallel

  • Look for opportunities for parallel execution in loops

The following methods can be used to specify the number of processors used in executing your parallel programs:

  • loop_parallel(max_threads=m) pragma

  • prefer_parallel(max_threads=m) pragma

  • MP_NUMBER_OF_THREADS environment variable, which is read at runtime by your program. If this variable is set to a positive integer n, your program executes on n processors. n must be less than or equal to the number of processors on the system where the program is executing.

    See “Setting the Number of Threads Used in Parallel” for an example.

The +Oparallel option disables +Ofailsafe.

See Also: “Transforming Loops for Parallel Execution (+Oparallel) ”.

+O[no]report[= report_type]

Optimization level(s): 3, 4

Default: +Onoreport

This option causes the compiler to display various optimization reports. +Onoreport is the default. The value of report_type determines which report is displayed, as described below.

+Oreport=loop produces the Loop Report. This report gives information on optimizations performed on loops and calls. Using +Oreport (without =report_type) also produces the Loop Report.

+Oreport=private produces the Loop Report and the Privatization Table, which provides information on loop variables that are privatized by the compiler.

+Oreport=all produces all reports.

The +Oreport[=report_type] option is active only at +O3 and above. The +Onoreport option does not accept any of the report_type values. See the Parallel Programming Guide for HP-UX Systems for more information on the optimization reports.

+O[no]sharedgra

Optimization level(s): 2, 3, 4

Default: +Osharedgra

The +Onosharedgra option disables global register allocation for shared-memory variables that are visible to multiple threads. This option can help if a variable shared among parallel threads is causing wrong answers. See the Parallel Programming Guide for HP-UX Systems for more information.

© Hewlett-Packard Development Company, L.P.